%0 Journal Article %J Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming %D 2008 %T Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications %A Oscar Hernandez %A Fengguang Song %A Barbara Chapman %A Jack Dongarra %A Bernd Mohr %A Shirley Moore %A Felix Wolf %B Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming %I Springer Berlin / Heidelberg %V 4315 %8 2008-00 %G eng %0 Conference Proceedings %B Proceedings of the 2nd International Workshop on Tools for High Performance Computing %D 2008 %T Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications %A Felix Wolf %A Brian Wylie %A Erika Abraham %A Wolfgang Frings %A Karl Fürlinger %A Markus Geimer %A Marc-Andre Hermanns %A Bernd Mohr %A Shirley Moore %A Matthias Pfeifer %E Michael Resch %E Rainer Keller %E Valentin Himmler %E Bettina Krammer %E A Schulz %K point %B Proceedings of the 2nd International Workshop on Tools for High Performance Computing %I Springer %C Stuttgart, Germany %P 157-167 %8 2008-01 %G eng %0 Journal Article %J Concurrency and Computation: Practice and Experience %D 2007 %T Automatic Analysis of Inefficiency Patterns in Parallel Applications %A Felix Wolf %A Bernd Mohr %A Jack Dongarra %A Shirley Moore %B Concurrency and Computation: Practice and Experience %V 19 %P 1481-1496 %8 2007-08 %G eng %0 Conference Proceedings %B 8th Workshop 'Parallel Systems and Algorithms' (PASA), Lecture Notes in Informatics %D 2006 %T Large Event Traces in Parallel Performance Analysis %A Felix Wolf %A Felix Freitag %A Bernd Mohr %A Shirley Moore %A Brian Wylie %K kojak %B 8th Workshop 'Parallel Systems and Algorithms' (PASA), Lecture Notes in Informatics %I Gesellschaft für Informatik %C Frankfurt/Main, Germany %8 2006-03 %G eng %0 Conference Proceedings %B Second International Workshop on OpenMP %D 2006 %T Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications %A Oscar Hernandez %A Fengguang Song %A Barbara Chapman %A Jack Dongarra %A Bernd Mohr %A Shirley Moore %A Felix Wolf %K kojak %B Second International Workshop on OpenMP %C Reims, France %8 2006-01 %G eng %0 Conference Proceedings %B Proc. of the 5th International Workshop on Performance Modeling, Evaluation, and Organization of Parallel and Distributed Systems (PMEO-PDS 2006) %D 2006 %T A Systematic Multi-step Methodology for Performance Analysis of Communication Traces of Distributed Applications based on Hierarchical Clustering %A Gabriela Aguilera %A Patricia J. Teller %A Michela Taufer %A Felix Wolf %K kojak %B Proc. of the 5th International Workshop on Performance Modeling, Evaluation, and Organization of Parallel and Distributed Systems (PMEO-PDS 2006) %I IEEE Computer Society %C Rhodes Island, Greece %8 2006-04 %G eng %0 Journal Article %J Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted) %D 2005 %T Automatic analysis of inefficiency patterns in parallel applications %A Felix Wolf %A Bernd Mohr %A Jack Dongarra %A Shirley Moore %K kojak %B Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted) %8 2005-00 %G eng %0 Conference Proceedings %B In Proceedings of the International Conference on Parallel Processing %D 2005 %T Automatic Experimental Analysis of Communication Patterns in Virtual Topologies %A Nikhil Bhatia %A Fengguang Song %A Felix Wolf %A Jack Dongarra %A Bernd Mohr %A Shirley Moore %K kojak %B In Proceedings of the International Conference on Parallel Processing %I IEEE Computer Society %C Oslo, Norway %8 2005-06 %G eng %0 Conference Proceedings %B In Proceedings of the European Conference on Parallel Computing (Euro-Par) %D 2005 %T Event-based Measurement and Analysis of One-sided Communication %A Marc-Andre Hermanns %A Bernd Mohr %A Felix Wolf %K kojak %B In Proceedings of the European Conference on Parallel Computing (Euro-Par) %I Springer %C Lisbon, Portugal %8 2005-08 %G eng %0 Conference Proceedings %B Second Workshop on Productivity and Performance in High-End Computing (P-PHEC) at 11th International Symposium on High Performance Computer Architecture (HPCA-2005) %D 2005 %T Improving Time to Solution with Automated Performance Analysis %A Shirley Moore %A Felix Wolf %A Jack Dongarra %A Bernd Mohr %K kojak %B Second Workshop on Productivity and Performance in High-End Computing (P-PHEC) at 11th International Symposium on High Performance Computer Architecture (HPCA-2005) %C San Francisco %8 2005-02 %G eng %0 Conference Proceedings %B Workshop on Patterns in High Performance Computing %D 2005 %T A Pattern-Based Approach to Automated Application Performance Analysis %A Nikhil Bhatia %A Shirley Moore %A Felix Wolf %A Jack Dongarra %A Bernd Mohr %K kojak %B Workshop on Patterns in High Performance Computing %C University of Illinois at Urbana-Champaign %8 2005-05 %G eng %0 Conference Proceedings %B In Proceedings of the 2005 SciDAC Conference %D 2005 %T Performance Analysis of GYRO: A Tool Evaluation %A Patrick H. Worley %A Jeff Candy %A Laura Carrington %A Kevin Huck %A Timothy Kaiser %A Kumar Mahinthakumar %A Allen D. Malony %A Shirley Moore %A Dan Reed %A Philip C. Roth %A H. Shan %A Sameer Shende %A Allan Snavely %A S. Sreepathi %A Felix Wolf %A Y. Zhang %K kojak %B In Proceedings of the 2005 SciDAC Conference %C San Francisco, CA %8 2005-06 %G eng %0 Conference Proceedings %B Mini-Symposium "Tools Support for Parallel Programming", Proceedings of Parallel Computing (ParCo) %D 2005 %T Performance Analysis of One-sided Communication Mechanisms %A Bernd Mohr %A Andrej Kühnal %A Marc-Andre Hermanns %A Felix Wolf %K kojak %B Mini-Symposium "Tools Support for Parallel Programming", Proceedings of Parallel Computing (ParCo) %C Malaga, Spain %8 2005-09 %G eng %0 Conference Paper %B Proceedings of DoD HPCMP UGC 2005 %D 2005 %T Performance Profiling and Analysis of DoD Applications using PAPI and TAU %A Shirley Moore %A David Cronk %A Felix Wolf %A Avi Purkayastha %A Patricia J. Teller %A Robert Araiza %A Gabriela Aguilera %A Jamie Nava %K papi %B Proceedings of DoD HPCMP UGC 2005 %I IEEE %C Nashville, TN %8 2005-06 %G eng %0 Conference Proceedings %B In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference %D 2005 %T Performance Profiling Overhead Compensation for MPI Programs %A Sameer Shende %A Allen D. Malony %A Alan Morris %A Felix Wolf %K kojak %B In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference %I Springer LNCS %8 2005-09 %G eng %0 Conference Proceedings %B In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference %D 2005 %T A Scalable Approach to MPI Application Performance Analysis %A Shirley Moore %A Felix Wolf %A Jack Dongarra %A Sameer Shende %A Allen D. Malony %A Bernd Mohr %K kojak %B In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference %I Springer LNCS %8 2005-09 %G eng %0 Conference Proceedings %B In Proc. of the International Conference on High Performance Computing and Communications (HPCC) %D 2005 %T Trace-Based Parallel Performance Overhead Compensation %A Felix Wolf %A Allen D. Malony %A Sameer Shende %A Alan Morris %K kojak %B In Proc. of the International Conference on High Performance Computing and Communications (HPCC) %C Sorrento (Naples), Italy %8 2005-09 %G eng %0 Conference Proceedings %B 2004 International Conference on Parallel Processing (ICCP-04) %D 2004 %T An Algebra for Cross-Experiment Performance Analysis %A Fengguang Song %A Felix Wolf %A Nikhil Bhatia %A Jack Dongarra %A Shirley Moore %K kojak %B 2004 International Conference on Parallel Processing (ICCP-04) %C Montreal, Quebec, Canada %8 2004-08 %G eng %0 Conference Paper %B 5th LCI International Conference on Linux Clusters: The HPC Revolution %D 2004 %T Automating the Large-Scale Collection and Analysis of Performance %A Phil Mucci %A Jack Dongarra %A Rick Kufrin %A Shirley Moore %A Fengguang Song %A Felix Wolf %K kojak %K papi %B 5th LCI International Conference on Linux Clusters: The HPC Revolution %C Austin, Texas %8 2004-05 %G eng %0 Generic %D 2004 %T CUBE User Manual %A Fengguang Song %A Felix Wolf %K kojak %B ICL Technical Report %8 2004-02 %G eng %0 Generic %D 2004 %T EARL - API Documentation %A Felix Wolf %K kojak %B ICL Technical Report %8 2004-10 %G eng %0 Conference Proceedings %B Proceedings of Euro-Par 2004 %D 2004 %T Efficient Pattern Search in Large Traces through Successive Refinement %A Felix Wolf %A Bernd Mohr %A Jack Dongarra %A Shirley Moore %K kojak %B Proceedings of Euro-Par 2004 %I Springer-Verlag %C Pisa, Italy %8 2004-08 %G eng %0 Journal Article %J Journal of Systems Architecture, Special Issue 'Evolutions in parallel distributed and network-based processing' %D 2003 %T Automatic performance analysis of hybrid MPI/OpenMP applications %A Felix Wolf %A Bernd Mohr %E Andrea Clematis %E Daniele D'Agostino %K kojak %B Journal of Systems Architecture, Special Issue 'Evolutions in parallel distributed and network-based processing' %I Elsevier %V 49(10-11) %P 421-439 %8 2003-11 %G eng %0 Journal Article %J Advances in Parallel Computing %D 2003 %T Hardware-Counter Based Automatic Performance Analysis of Parallel Programs %A Felix Wolf %A Bernd Mohr %K kojak %K papi %X The KOJAK performance-analysis environment identifies a large number of performance problems on parallel computers with SMP nodes. The current version concentrates on parallelism-related performance problems that arise from an inefficient usage of the parallel programming interfaces MPI and OpenMP, while ignoring individual CPU performance. This chapter describes an extended design of KOJAK capable of diagnosing low individual-CPU performance based on hardware-counter information and of integrating the results with those of the parallelism-centered analysis. The performance of parallel applications is determined by a variety of different factors. Performance of single components frequently influences the overall behavior in unexpected ways. Application programmers on current parallel machines have to deal with numerous performance-critical aspects: different modes of parallel execution, such as message passing, multi-threading or even a combination of the two, and performance on individual CPU that is determined by the interaction of different functional units. The KOJAK analysis process is composed of two parts: a semi-automatic instrumentation of the user application followed by an automatic analysis of the generated performance data. KOJAK's instrumentation software runs on most major UNlX platforms and works on multiple levels, including source-code, compiler, and linker. %B Advances in Parallel Computing %I Elsevier %C Dresden, Germany %V 13 %P 753-760 %8 2004-01 %G eng %R https://doi.org/10.1016/S0927-5452(04)80092-3 %0 Conference Proceedings %B Proc. of the European Conference on Parallel Computing (EuroPar) %D 2003 %T KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications %A Bernd Mohr %A Felix Wolf %K kojak %B Proc. of the European Conference on Parallel Computing (EuroPar) %I Springer-Verlag %C Klagenfurt, Austria %V 2790 %P 1301-1304 %8 2003-08 %G eng