%0 Conference Paper %B 2013 IEEE International Symposium on Performance Analysis of Systems and Software %D 2013 %T Non-Determinism and Overcount on Modern Hardware Performance Counter Implementations %A Vincent Weaver %A Dan Terpstra %A Shirley Moore %B 2013 IEEE International Symposium on Performance Analysis of Systems and Software %I IEEE %C Austin, TX %8 2013-04 %G eng %0 Generic %D 2013 %T PAPI 5: Measuring Power, Energy, and the Cloud %A Vincent Weaver %A Dan Terpstra %A Heike McCraw %A Matt Johnson %A Kiran Kasichayanula %A James Ralph %A John Nelson %A Phil Mucci %A Tushar Mohan %A Shirley Moore %I 2013 IEEE International Symposium on Performance Analysis of Systems and Software %C Austin, TX %8 2013-04 %G eng %0 Conference Proceedings %B International Workshop on Power-Aware Systems and Architectures %D 2012 %T Measuring Energy and Power with PAPI %A Vincent M Weaver %A Matt Johnson %A Kiran Kasichayanula %A James Ralph %A Piotr Luszczek %A Dan Terpstra %A Shirley Moore %K papi %X Energy and power consumption are becoming critical metrics in the design and usage of high performance systems. We have extended the Performance API (PAPI) analysis library to measure and report energy and power values. These values are reported using the existing PAPI API, allowing code previously instrumented for performance counters to also measure power and energy. Higher level tools that build on PAPI will automatically gain support for power and energy readings when used with the newest version of PAPI. We describe in detail the types of energy and power readings available through PAPI. We support external power meters, as well as values provided internally by recent CPUs and GPUs. Measurements are provided directly to the instrumented process, allowing immediate code analysis in real time. We provide examples showing results that can be obtained with our infrastructure. %B International Workshop on Power-Aware Systems and Architectures %C Pittsburgh, PA %8 2012-09 %G eng %R 10.1109/ICPPW.2012.39 %0 Journal Article %J CloudTech-HPC 2012 %D 2012 %T PAPI-V: Performance Monitoring for Virtual Machines %A Matt Johnson %A Heike McCraw %A Shirley Moore %A Phil Mucci %A John Nelson %A Dan Terpstra %A Vincent M Weaver %A Tushar Mohan %K papi %X This paper describes extensions to the PAPI hardware counter library for virtual environments, called PAPI-V. The extensions support timing routines, I/O measurements, and processor counters. The PAPI-V extensions will allow application and tool developers to use a familiar interface to obtain relevant hardware performance monitoring information in virtual environments. %B CloudTech-HPC 2012 %C Pittsburgh, PA %8 2012-09 %G eng %R 10.1109/ICPPW.2012.29 %0 Journal Article %J SAAHPC '12 (Best Paper Award) %D 2012 %T Power Aware Computing on GPUs %A Kiran Kasichayanula %A Dan Terpstra %A Piotr Luszczek %A Stanimire Tomov %A Shirley Moore %A Gregory D. Peterson %K magma %B SAAHPC '12 (Best Paper Award) %C Argonne, IL %8 2012-07 %G eng %0 Journal Article %J TeraGrid'11 %D 2011 %T Autotuned Parallel I/O for Highly Scalable Biosequence Analysis %A Haihang You %A Bhanu Rekapalli %A Qing Liu %A Shirley Moore %B TeraGrid'11 %C Salt Lake City, Utah %8 2011-07 %G eng %0 Conference Proceedings %B Cray Users Group Conference (CUG'11) (Best Paper Finalist) %D 2011 %T The Design of an Auto-tuning I/O Framework on Cray XT5 System %A Haihang You %A Qing Liu %A Zhiqiang Li %A Shirley Moore %K gco %B Cray Users Group Conference (CUG'11) (Best Paper Finalist) %C Fairbanks, Alaska %8 2011-05 %G eng %0 Journal Article %J International Journal of High Performance Computing Applications %D 2011 %T Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems %A Charles Lively %A Xingfu Wu %A Valerie Taylor %A Shirley Moore %A Hung-Ching Chang %A Kirk Cameron %K mumi %B International Journal of High Performance Computing Applications %V 25 %P 342-350 %8 2011-00 %G eng %0 Conference Proceedings %B 6th Workshop on Virtualization in High-Performance Cloud Computing %D 2011 %T Evaluation of the HPC Challenge Benchmarks in Virtualized Environments %A Piotr Luszczek %A Eric Meek %A Shirley Moore %A Dan Terpstra %A Vincent M Weaver %A Jack Dongarra %K hpcc %B 6th Workshop on Virtualization in High-Performance Cloud Computing %C Bordeaux, France %8 2011-08 %G eng %0 Generic %D 2011 %T Power-aware Computing on GPGPUs %A Kiran Kasichayanula %A Haihang You %A Shirley Moore %A Stanimire Tomov %A Heike Jagode %A Matt Johnson %I Fall Creek Falls Conference, Poster %C Gatlinburg, TN %8 2011-09 %G eng %0 Conference Proceedings %B International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011) %D 2011 %T Power-Aware Prediction Models of Hybrid (MPI/OpenMP) Scientific Applications %A Charles Lively %A Xingfu Wu %A Valerie Taylor %A Shirley Moore %A Hung-Ching Chang %A Chun-Yi Su %A Kirk Cameron %K mumi %B International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011) %C Hamburg, Germany %8 2011-09 %G eng %0 Journal Article %J Procedia Computer Science %D 2011 %T User-Defined Events for Hardware Performance Monitoring %A Shirley Moore %A James Ralph %K mumi %K papi %X PAPI is a widely used cross-platform interface to hardware performance counters. PAPI currently supports native events, which are those provided by a given platform, and preset events, which are pre-defined events thought to be common across platforms. Presets are currently mapped and defined at the time that PAPI is compiled and installed. The idea of user-defined events is to allow users to define their own metrics and to have those metrics mapped to events on a platform without the need to re-install PAPI. User-defined events can be defined in terms of native, preset, and previously defined user-defined events. The user can combine events and constants in an arbitrary expression to define a new metric and give a name to the new metric. This name can then be specified as a PAPI event in a PAPI library call the same way as native and preset events. End-user tools such as TAU and Scalasca that use PAPI can also use the user-defined metrics. Users can publish their metric definitions so that other users can use them as well. We present several examples of how user-defined events can be used for performance analysis and modeling. %B Procedia Computer Science %I Elsevier %V 4 %P 2096-2104 %8 2011-05 %G eng %R https://doi.org/10.1016/j.procs.2011.04.229 %0 Journal Article %J in Performance Tuning of Scientific Applications (to appear) %D 2010 %T Empirical Performance Tuning of Dense Linear Algebra Software %A Jack Dongarra %A Shirley Moore %E David Bailey %E Robert Lucas %E Sam Williams %B in Performance Tuning of Scientific Applications (to appear) %8 2010-00 %G eng %0 Conference Proceedings %B Proceedings of the Cray Users' Group Meeting %D 2010 %T Performance Evaluation for Petascale Quantum Simulation Tools %A Stanimire Tomov %A Wenchang Lu %A %A Jerzy Bernholc %A Shirley Moore %A Jack Dongarra %B Proceedings of the Cray Users' Group Meeting %C Atlanta, GA %8 2010-05 %G eng %0 Journal Article %J PARA 2010 %D 2010 %T Scalability Study of a Quantum Simulation Code %A Jerzy Bernholc %A Miroslav Hodak %A Wenchang Lu %A Shirley Moore %A Stanimire Tomov %B PARA 2010 %C Reykjavik, Iceland %8 2010-06 %G eng %0 Journal Article %J IEEE Cluster 2009 %D 2009 %T Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems %A Fengguang Song %A Shirley Moore %A Jack Dongarra %K gridpac %K mumi %B IEEE Cluster 2009 %C New Orleans %8 2009-08 %G eng %0 Journal Article %J International Journal of Parallel Programming %D 2009 %T Capturing and Analyzing the Execution Control Flow of OpenMP Applications %A Karl Fürlinger %A Shirley Moore %B International Journal of Parallel Programming %V 37 %P 266-276 %8 2009-00 %G eng %0 Journal Article %J ISC'09 %D 2009 %T I/O Performance Analysis for the Petascale Simulation Code FLASH %A Heike Jagode %A Shirley Moore %A Dan Terpstra %A Jack Dongarra %A Andreas Knuepfer %A Matthias Jurenz %A Matthias S. Mueller %A Wolfgang E. Nagel %K test %B ISC'09 %C Hamburg, Germany %8 2009-06 %G eng %0 Conference Proceedings %B Proceedings of DoD HPCMP UGC 2009 %D 2009 %T Making Performance Analysis and Tuning Part of the Software Development Cycle %A Ricardo Portillo %A Patricia J. Teller %A David Cronk %A Shirley Moore %B Proceedings of DoD HPCMP UGC 2009 %I IEEE %C San Diego, CA %8 2009-06 %G eng %0 Conference Proceedings %B SciDAC 2009, Journal of Physics: Conference Series %D 2009 %T Modeling the Office of Science Ten Year Facilities Plan: The PERI Architecture Tiger Team %A Bronis R. de Supinski %A Sadaf Alam %A David Bailey %A Laura Carrington %A Chris Daley %A Anshu Dubey %A Todd Gamblin %A Dan Gunter %A Paul D. Hovland %A Heike Jagode %A Karen Karavanic %A Gabriel Marin %A John Mellor-Crummey %A Shirley Moore %A Boyana Norris %A Leonid Oliker %A Catherine Olschanowsky %A Philip C. Roth %A Martin Schulz %A Sameer Shende %A Allan Snavely %K test %B SciDAC 2009, Journal of Physics: Conference Series %I IOP Publishing %C San Diego, California %V 180(2009)012039 %8 2009-07 %G eng %0 Conference Proceedings %B Proceedings of CUG09 %D 2009 %T Performance evaluation for petascale quantum simulation tools %A Stanimire Tomov %A Wenchang Lu %A Jerzy Bernholc %A Shirley Moore %A Jack Dongarra %K doe-nano %B Proceedings of CUG09 %C Atlanta, GA %8 2009-05 %G eng %0 Journal Article %J Future Generation Computing Systems %D 2009 %T Recording the Control Flow of Parallel Applications to Determine Iterative and Phase-Based Behavior %A Karl Fürlinger %A Shirley Moore %B Future Generation Computing Systems %V 26 %P 162-166 %8 2009-00 %G eng %0 Conference Proceedings %B The International Conference on Computational Science 2009 (ICCS 2009) %D 2009 %T A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling %A Fengguang Song %A Shirley Moore %A Jack Dongarra %K plasma %B The International Conference on Computational Science 2009 (ICCS 2009) %C Baton Rouge, LA %V 5544 %P 195-204 %8 2009-05 %G eng %0 Generic %D 2008 %T Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms %A Fengguang Song %A Shirley Moore %A Jack Dongarra %B University of Tennessee Computer Science Technical Report, UT-CS-08-626 %8 2008-01 %G eng %0 Conference Proceedings %B Proceedings of the 2008 International Conference on Computational Science (ICCS 2008) %D 2008 %T Detection and Analysis of Iterative Behavior in Parallel Applications %A Karl Fürlinger %A Shirley Moore %K point %B Proceedings of the 2008 International Conference on Computational Science (ICCS 2008) %C Krakow, Poland %V 5103 %P 261-267 %8 2008-01 %G eng %0 Conference Proceedings %B Proceedings of the DoD HPCMP User Group Conference %D 2008 %T Exploring New Architectures in Accelerating CFD for Air Force Applications %A Jack Dongarra %A Shirley Moore %A Gregory D. Peterson %A Stanimire Tomov %A Jeff Allred %A Vincent Natoli %A David Richie %K magma %B Proceedings of the DoD HPCMP User Group Conference %C Seattle, Washington %8 2008-01 %G eng %0 Conference Proceedings %B Proc. 2008 IEEE International Conference on Cluster Computing (CLUSTER 2008) %D 2008 %T OpenMP-centric Performance Analysis of Hybrid Applications %A Karl Fürlinger %A Shirley Moore %B Proc. 2008 IEEE International Conference on Cluster Computing (CLUSTER 2008) %C Tsukuba, Japan %8 2008-01 %G eng %0 Journal Article %J Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming %D 2008 %T Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications %A Oscar Hernandez %A Fengguang Song %A Barbara Chapman %A Jack Dongarra %A Bernd Mohr %A Shirley Moore %A Felix Wolf %B Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming %I Springer Berlin / Heidelberg %V 4315 %8 2008-00 %G eng %0 Journal Article %J Proc. SciDAC 2008 %D 2008 %T PERI Auto-tuning %A David Bailey %A Jacqueline Chame %A Chun Chen %A Jack Dongarra %A Mary Hall %A Jeffrey K. Hollingsworth %A Paul D. Hovland %A Shirley Moore %A Keith Seymour %A Jaewook Shin %A Ananta Tiwari %A Sam Williams %A Haihang You %K gco %B Proc. SciDAC 2008 %I Journal of Physics %C Seatlle, Washington %V 125 %8 2008-01 %G eng %0 Conference Proceedings %B Proceedings of the 2nd International Workshop on Tools for High Performance Computing %D 2008 %T Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications %A Felix Wolf %A Brian Wylie %A Erika Abraham %A Wolfgang Frings %A Karl Fürlinger %A Markus Geimer %A Marc-Andre Hermanns %A Bernd Mohr %A Shirley Moore %A Matthias Pfeifer %E Michael Resch %E Rainer Keller %E Valentin Himmler %E Bettina Krammer %E A Schulz %K point %B Proceedings of the 2nd International Workshop on Tools for High Performance Computing %I Springer %C Stuttgart, Germany %P 157-167 %8 2008-01 %G eng %0 Conference Proceedings %B Proc. 4th International Workshop on OpenMP (IWOMP 2008) %D 2008 %T Visualizing the Program Execution Control Flow of OpenMP Applications %A Karl Fürlinger %A Shirley Moore %B Proc. 4th International Workshop on OpenMP (IWOMP 2008) %I Lecture Notes in Computer Science 5004 %C West Lafayette, Indiana %P 181-190 %8 2008-01 %G eng %0 Generic %D 2007 %T Automated Empirical Tuning of a Multiresolution Analysis Kernel %A Haihang You %A Keith Seymour %A Jack Dongarra %A Shirley Moore %K gco %B ICL Technical Report %P 10 %8 2007-01 %G eng %0 Journal Article %J Concurrency and Computation: Practice and Experience %D 2007 %T Automatic Analysis of Inefficiency Patterns in Parallel Applications %A Felix Wolf %A Bernd Mohr %A Jack Dongarra %A Shirley Moore %B Concurrency and Computation: Practice and Experience %V 19 %P 1481-1496 %8 2007-08 %G eng %0 Conference Proceedings %B Proceedings of the 2007 Conference on Parallel Computing (PARCO 2007) %D 2007 %T Continuous Runtime Profiling of OpenMP Applications %A Karl Fürlinger %A Shirley Moore %K kojak %B Proceedings of the 2007 Conference on Parallel Computing (PARCO 2007) %C Juelich and Aachen, Germany %8 2007-01 %G eng %0 Generic %D 2007 %T Empirical Tuning of a Multiresolution Analysis Kernel using a Specialized Code Generator %A Haihang You %A Keith Seymour %A Jack Dongarra %A Shirley Moore %K gco %B ICL Technical Report %8 2007-01 %G eng %0 Conference Proceedings %B IEEE International Symposium on High Performance Distributed Computing %D 2007 %T Feedback-Directed Thread Scheduling with Memory Considerations %A Fengguang Song %A Shirley Moore %A Jack Dongarra %B IEEE International Symposium on High Performance Distributed Computing %C Monterey Bay, CA %8 2007-06 %G eng %0 Conference Proceedings %B Proceedings of the 2007 International Conference on Parallel Processing %D 2007 %T L2 Cache Modeling for Scientific Applications on Chip Multi-Processors %A Fengguang Song %A Shirley Moore %A Jack Dongarra %B Proceedings of the 2007 International Conference on Parallel Processing %I IEEE Computer Society %C Xi'an, China %8 2007-01 %G eng %0 Conference Proceedings %B Proc. DoD HPCMP Users Group Conference (HPCMP-UGC'07) %D 2007 %T Memory Leak Detection in Fortran Applications using TAU %A Sameer Shende %A Allen D. Malony %A Shirley Moore %A David Cronk %B Proc. DoD HPCMP Users Group Conference (HPCMP-UGC'07) %I IEEE Computer Society %C Pittsburgh, PA %8 2007-01 %G eng %0 Conference Proceedings %B Journal of Physics: Conference Series, SciDAC 2007 %D 2007 %T Results of the PERI survey of SciDAC applications %A Bronis R. de Supinski %A Jeffrey K. Hollingsworth %A Shirley Moore %A Patrick H. Worley %B Journal of Physics: Conference Series, SciDAC 2007 %V 78 %8 2007-01 %G eng %0 Conference Proceedings %B 18th IASTED International Conference on Parallel and Distributed Computing and Systems PDCS 2006 (submitted) %D 2006 %T Experiments with Strassen's Algorithm: From Sequential to Parallel %A Fengguang Song %A Jack Dongarra %A Shirley Moore %B 18th IASTED International Conference on Parallel and Distributed Computing and Systems PDCS 2006 (submitted) %C Dallas, Texas %8 2006-01 %G eng %0 Conference Proceedings %B 8th Workshop 'Parallel Systems and Algorithms' (PASA), Lecture Notes in Informatics %D 2006 %T Large Event Traces in Parallel Performance Analysis %A Felix Wolf %A Felix Freitag %A Bernd Mohr %A Shirley Moore %A Brian Wylie %K kojak %B 8th Workshop 'Parallel Systems and Algorithms' (PASA), Lecture Notes in Informatics %I Gesellschaft für Informatik %C Frankfurt/Main, Germany %8 2006-03 %G eng %0 Generic %D 2006 %T Modeling of L2 Cache Behavior for Thread-Parallel Scientific Programs on Chip Multi-Processors %A Fengguang Song %A Shirley Moore %A Jack Dongarra %B University of Tennessee Computer Science Technical Report %8 2006-01 %G eng %0 Conference Proceedings %B Second International Workshop on OpenMP %D 2006 %T Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications %A Oscar Hernandez %A Fengguang Song %A Barbara Chapman %A Jack Dongarra %A Bernd Mohr %A Shirley Moore %A Felix Wolf %K kojak %B Second International Workshop on OpenMP %C Reims, France %8 2006-01 %G eng %0 Journal Article %J Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted) %D 2005 %T Automatic analysis of inefficiency patterns in parallel applications %A Felix Wolf %A Bernd Mohr %A Jack Dongarra %A Shirley Moore %K kojak %B Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted) %8 2005-00 %G eng %0 Conference Proceedings %B In Proceedings of the International Conference on Parallel Processing %D 2005 %T Automatic Experimental Analysis of Communication Patterns in Virtual Topologies %A Nikhil Bhatia %A Fengguang Song %A Felix Wolf %A Jack Dongarra %A Bernd Mohr %A Shirley Moore %K kojak %B In Proceedings of the International Conference on Parallel Processing %I IEEE Computer Society %C Oslo, Norway %8 2005-06 %G eng %0 Conference Proceedings %B Second Workshop on Productivity and Performance in High-End Computing (P-PHEC) at 11th International Symposium on High Performance Computer Architecture (HPCA-2005) %D 2005 %T Improving Time to Solution with Automated Performance Analysis %A Shirley Moore %A Felix Wolf %A Jack Dongarra %A Bernd Mohr %K kojak %B Second Workshop on Productivity and Performance in High-End Computing (P-PHEC) at 11th International Symposium on High Performance Computer Architecture (HPCA-2005) %C San Francisco %8 2005-02 %G eng %0 Conference Proceedings %B Workshop on Patterns in High Performance Computing %D 2005 %T A Pattern-Based Approach to Automated Application Performance Analysis %A Nikhil Bhatia %A Shirley Moore %A Felix Wolf %A Jack Dongarra %A Bernd Mohr %K kojak %B Workshop on Patterns in High Performance Computing %C University of Illinois at Urbana-Champaign %8 2005-05 %G eng %0 Conference Proceedings %B In Proceedings of the 2005 SciDAC Conference %D 2005 %T Performance Analysis of GYRO: A Tool Evaluation %A Patrick H. Worley %A Jeff Candy %A Laura Carrington %A Kevin Huck %A Timothy Kaiser %A Kumar Mahinthakumar %A Allen D. Malony %A Shirley Moore %A Dan Reed %A Philip C. Roth %A H. Shan %A Sameer Shende %A Allan Snavely %A S. Sreepathi %A Felix Wolf %A Y. Zhang %K kojak %B In Proceedings of the 2005 SciDAC Conference %C San Francisco, CA %8 2005-06 %G eng %0 Conference Paper %B Proceedings of DoD HPCMP UGC 2005 %D 2005 %T Performance Profiling and Analysis of DoD Applications using PAPI and TAU %A Shirley Moore %A David Cronk %A Felix Wolf %A Avi Purkayastha %A Patricia J. Teller %A Robert Araiza %A Gabriela Aguilera %A Jamie Nava %K papi %B Proceedings of DoD HPCMP UGC 2005 %I IEEE %C Nashville, TN %8 2005-06 %G eng %0 Conference Proceedings %B In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference %D 2005 %T A Scalable Approach to MPI Application Performance Analysis %A Shirley Moore %A Felix Wolf %A Jack Dongarra %A Sameer Shende %A Allen D. Malony %A Bernd Mohr %K kojak %B In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference %I Springer LNCS %8 2005-09 %G eng %0 Conference Paper %B International Conference on Computational Science (ICCS 2004) %D 2004 %T Accurate Cache and TLB Characterization Using Hardware Counters %A Jack Dongarra %A Shirley Moore %A Phil Mucci %A Keith Seymour %A Haihang You %K gco %K lacsi %K papi %X We have developed a set of microbenchmarks for accurately determining the structural characteristics of data cache memories and TLBs. These characteristics include cache size, cache line size, cache associativity, memory page size, number of data TLB entries, and data TLB associativity. Unlike previous microbenchmarks that used time-based measurements, our microbenchmarks use hardware event counts to more accurately and quickly determine these characteristics while requiring fewer limiting assumptions. %B International Conference on Computational Science (ICCS 2004) %I Springer %C Krakow, Poland %8 2004-06 %G eng %R https://doi.org/10.1007/978-3-540-24688-6_57 %0 Conference Proceedings %B 2004 International Conference on Parallel Processing (ICCP-04) %D 2004 %T An Algebra for Cross-Experiment Performance Analysis %A Fengguang Song %A Felix Wolf %A Nikhil Bhatia %A Jack Dongarra %A Shirley Moore %K kojak %B 2004 International Conference on Parallel Processing (ICCP-04) %C Montreal, Quebec, Canada %8 2004-08 %G eng %0 Conference Paper %B 5th LCI International Conference on Linux Clusters: The HPC Revolution %D 2004 %T Automating the Large-Scale Collection and Analysis of Performance %A Phil Mucci %A Jack Dongarra %A Rick Kufrin %A Shirley Moore %A Fengguang Song %A Felix Wolf %K kojak %K papi %B 5th LCI International Conference on Linux Clusters: The HPC Revolution %C Austin, Texas %8 2004-05 %G eng %0 Conference Proceedings %B Proceedings of Euro-Par 2004 %D 2004 %T Efficient Pattern Search in Large Traces through Successive Refinement %A Felix Wolf %A Bernd Mohr %A Jack Dongarra %A Shirley Moore %K kojak %B Proceedings of Euro-Par 2004 %I Springer-Verlag %C Pisa, Italy %8 2004-08 %G eng %0 Generic %D 2004 %T NetBuild: Automated Installation and Use of Network-Accessible Software Libraries %A Keith Moore %A Jack Dongarra %A Shirley Moore %A Eric Grosse %K netbuild %B ICL Technical Report %8 2004-01 %G eng %0 Conference Paper %B PADTAD Workshop, IPDPS 2003 %D 2003 %T Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters %A Jack Dongarra %A Kevin London %A Shirley Moore %A Phil Mucci %A Dan Terpstra %A Haihang You %A Min Zhou %K lacsi %K papi %X The PAPI project has defined and implemented a cross-platform interface to the hardware counters available on most modern microprocessors. The interface has gained widespread use and acceptance from hardware vendors, users, and tool developers. This paper reports on experiences with the community-based open-source effort to define the PAPI specification and implement it on a variety of platforms. Collaborations with tool developers who have incorporated support for PAPI are described. Issues related to interpretation and accuracy of hardware counter data and to the overheads of collecting this data are discussed. The paper concludes with implications for the design of the next version of PAPI. %B PADTAD Workshop, IPDPS 2003 %I IEEE %C Nice, France %8 2003-04 %@ 0-7695-1926-1 %G eng %0 Conference Paper %B ICCS 2003 Terascale Workshop %D 2003 %T Performance Instrumentation and Measurement for Terascale Systems %A Jack Dongarra %A Allen D. Malony %A Shirley Moore %A Phil Mucci %A Sameer Shende %K papi %X As computer systems grow in size and complexity, tool support is needed to facilitate the efficient mapping of large-scale applications onto these systems. To help achieve this mapping, performance analysis tools must provide robust performance observation capabilities at all levels of the system, as well as map low-level behavior to high-level program constructs. Instrumentation and measurement strategies, developed over the last several years, must evolve together with performance analysis infrastructure to address the challenges of new scalable parallel systems. %B ICCS 2003 Terascale Workshop %I Springer, Berlin, Heidelberg %C Melbourne, Australia %8 2003-06 %G eng %R https://doi.org/10.1007/3-540-44864-0_6 %0 Journal Article %J Journal of Digital Information special issue on Interactivity in Digital Libraries %D 2002 %T Active Netlib: An Active Mathematical Software Collection for Inquiry-based Computational Science and Engineering Education %A Shirley Moore %A A.J. Baker %A Jack Dongarra %A Christian Halloy %A Chung Ng %K activenetlib %K rib %B Journal of Digital Information special issue on Interactivity in Digital Libraries %V 2 %8 2002-00 %G eng %0 Conference Paper %B International Conference on Computational Science (ICCS 2002) %D 2002 %T A Comparison of Counting and Sampling Modes of Using Performance Monitoring Hardware %A Shirley Moore %K papi %X Performance monitoring hardware is available on most modern microprocessors in the form of hardware counters and other registers that record data about processor events. This hardware may be used in counting mode, in which aggregate events counts are accumulated, and/or in sampling mode, in which time-based or event-based sampling is used to collect profiling data. This paper discusses uses of these two modes and considers the issues of efficiency and accuracy raised by each. Implications for the PAPI cross-platform hardware counter interface are also discussed. %B International Conference on Computational Science (ICCS 2002) %I Springer %C Amsterdam, Netherlands %8 2002-04 %G eng %R https://doi.org/10.1007/3-540-46080-2_95 %0 Journal Article %J International Journal of High Performance Applications and Supercomputing %D 2002 %T Numerical Libraries and Tools for Scalable Parallel Cluster Computing %A Shirley Browne %A Jack Dongarra %A Anne Trefethen %B International Journal of High Performance Applications and Supercomputing %V 15 %P 175-180 %8 2002-10 %G eng %0 Conference Paper %B International Conference on Parallel and Distributed Computing Systems %D 2001 %T End-user Tools for Application Performance Analysis, Using Hardware Counters %A Kevin London %A Jack Dongarra %A Shirley Moore %A Phil Mucci %A Keith Seymour %A T. Spencer %K papi %X One purpose of the end-user tools described in this paper is to give users a graphical representation of performance information that has been gathered by instrumenting an application with the PAPI library. PAPI is a project that specifies a standard API for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count "events", which are occurrences of specific signals and states related to a processor’s function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. The perfometer tool developed by the PAPI project provides a graphical view of this information, allowing users to quickly see where performance bottlenecks are in their application. Only one function call has to be added by the user to their program to take advantage of perfometer. This makes it quick and simple to add and remove instrumentation from a program. Also, perfometer allows users to change the "event" they are monitoring. Add the ability to monitor parallel applications, set alarms and a Java front-end that can run anywhere, and this gives the user a powerful tool for quickly discovering where and why a bottleneck exists. A number of third-party tools for analyzing performance of message-passing and/or threaded programs have also incorporated support for PAPI so as to be able to display and analyze hardware counter data from their interfaces. %B International Conference on Parallel and Distributed Computing Systems %C Dallas, TX %8 2001-08 %G eng %0 Conference Proceedings %B Department of Defense Users' Group Conference (to appear) %D 2001 %T Metacomputing Support for the SARA3D Structural Acoustics Application %A Shirley Moore %A Dorian Arnold %A David Cronk %K netsolve %B Department of Defense Users' Group Conference (to appear) %C Biloxi, Mississippi %8 2001-06 %G eng %0 Journal Article %J International Journal of High Performance Applications and Supercomputing %D 2001 %T Numerical Libraries and Tools for Scalable Parallel Cluster Computing %A Jack Dongarra %A Shirley Moore %A Anne Trefethen %B International Journal of High Performance Applications and Supercomputing %V 15 %P 175-180 %8 2001-01 %G eng %0 Conference Paper %B Department of Defense Users' Group Conference Proceedings %D 2001 %T The PAPI Cross-Platform Interface to Hardware Performance Counters %A Kevin London %A Shirley Moore %A Phil Mucci %A Keith Seymour %A Richard Luczak %K papi %X The purpose of the PAPI project is to specify a standard API for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count "events," which are occurrences of specific signals and states related to the processor's function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. This correlation has a variety of uses in performance analysis and tuning. The PAPI project has developed a standard set of hardware events and a standard cross-platform library interface to the underlying counter hardware. The PAPI library has been implemented for a number of Shared Resource Center platforms. The PAPI project is developing end-user tools for dynamically selecting and displaying hardware counter performance data. PAPI support is also being incorporated into a number of third-party tools. %B Department of Defense Users' Group Conference Proceedings %C Biloxi, Mississippi %8 2001-06 %G eng %0 Conference Proceedings %B Department of Defense Users' Group Conference Proceedings (to appear), %D 2001 %T Parallel I/O for EQM Applications %A David Cronk %A Graham Fagg %A Shirley Moore %K ftmpi %B Department of Defense Users' Group Conference Proceedings (to appear), %C Biloxi, Mississippi %8 2001-06 %G eng %0 Generic %D 2001 %T Repository in a Box Toolkit for Software and Resource Sharing %A Shirley Browne %A Paul McMahan %A Scott Wells %K rib %B University of Tennessee Computer Science Department Technical Report %8 2001-00 %G eng %0 Journal Article %J European Parallel Virtual Machine / Message Passing Interface Users’ Group Meeting, Lecture Notes in Computer Science 2131 %D 2001 %T Review of Performance Analysis Tools for MPI Parallel Programs %A Shirley Moore %A David Cronk %A Kevin London %A Jack Dongarra %K papi %X In order to produce MPI applications that perform well on today’s parallel architectures, programmers need effective tools for collecting and analyzing performance data. A variety of such tools, both commercial and research, are becoming available. This paper reviews and evaluations the available cross-platform MPI performance analysis tools. %B European Parallel Virtual Machine / Message Passing Interface Users’ Group Meeting, Lecture Notes in Computer Science 2131 %I Springer Verlag, Berlin %C Greece %P 241-248 %8 2001-09 %G eng %R https://doi.org/10.1007/3-540-45417-9_34 %0 Conference Paper %B Conference on Linux Clusters: The HPC Revolution %D 2001 %T Using PAPI for Hardware Performance Monitoring on Linux Systems %A Jack Dongarra %A Kevin London %A Shirley Moore %A Phil Mucci %A Dan Terpstra %K papi %X PAPI is a specification of a cross-platform interface to hardware performance counters on modern microprocessors. These counters exist as a small set of registers that count events, which are occurrences of specific signals related to a processor's function. Monitoring these events has a variety of uses in application performance analysis and tuning. The PAPI specification consists of both a standard set of events deemed most relevant for application performance tuning, as well as both high-level and low-level sets of routines for accessing the counters. The high level interface simply provides the ability to start, stop, and read sets of events, and is intended for the acquisition of simple but accurate measurement by application engineers. The fully programmable low-level interface provides sophisticated options for controlling the counters, such as setting thresholds for interrupt on overflow, as well as access to all native counting modes and events, and is intended for third-party tool writers or users with more sophisticated needs. PAPI has been implemented on a number of platforms, including Linux/x86 and Linux/IA-64. The Linux/x86 implementation requires a kernel patch that provides a driver for the hardware counters. The driver memory maps the counter registers into user space and allows virtualizing the counters on a perprocess or per-thread basis. The kernel patch is being proposed for inclusion in the main Linux tree. The PAPI library provides access on Linux platforms not only to the standard set of events mentioned above but also to all the Linux/x86 and Linux/IA-64 native events. PAPI has been installed and is in use, either directly or through incorporation into third-party end-user performance analysis tools, on a number of Linux clusters, including the New Mexico LosLobos cluster and Linux clusters at NCSA and the University of Tennessee being used for the GrADS (Grid Application Development Software) project. %B Conference on Linux Clusters: The HPC Revolution %I Linux Clusters Institute %C Urbana, Illinois %8 2001-06 %G eng %0 Journal Article %J The International Journal of High Performance Computing Applications %D 2000 %T A Portable Programming Interface for Performance Evaluation on Modern Processors %A Shirley Browne %A Jack Dongarra %A Nathan Garner %A George Ho %A Phil Mucci %K papi %B The International Journal of High Performance Computing Applications %V 14 %P 189-204 %8 2000-09 %G eng %R https://doi.org/10.1177/109434200001400303 %0 Generic %D 2000 %T A Portable Programming Interface for Performance Evaluation on Modern Processors %A Shirley Browne %A Jack Dongarra %A Nathan Garner %A Kevin London %A Phil Mucci %B University of Tennessee Computer Science Technical Report, UT-CS-00-444 %8 2000-07 %G eng %0 Conference Proceedings %B Proceedings of SuperComputing 2000 (SC'00) %D 2000 %T A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters %A Shirley Browne %A Jack Dongarra %A Nathan Garner %A Kevin London %A Phil Mucci %K papi %B Proceedings of SuperComputing 2000 (SC'00) %C Dallas, TX %8 2000-11 %G eng %0 Generic %D 2000 %T Secure Remote Access to Numerical Software and Computation Hardware %A Dorian Arnold %A Shirley Browne %A Jack Dongarra %A Graham Fagg %A Keith Moore %B University of Tennessee Computer Science Technical Report, UT-CS-00-446 %8 2000-07 %G eng %0 Conference Proceedings %B Proceedings of the DoD HPC Users Group Conference (HPCUG) 2000 %D 2000 %T Secure Remote Access to Numerical Software and Computational Hardware %A Dorian Arnold %A Shirley Browne %A Jack Dongarra %A Graham Fagg %A Keith Moore %K netsolve %B Proceedings of the DoD HPC Users Group Conference (HPCUG) 2000 %C Albuquerque, NM %8 2000-06 %G eng %0 Journal Article %J IEEE Cluster Computing BOF at SC99 %D 1999 %T Numerical Libraries and Tools for Scalable Parallel Cluster Computing %A Shirley Browne %A Jack Dongarra %A Anne Trefethen %B IEEE Cluster Computing BOF at SC99 %C Portland, Oregon %8 1999-01 %G eng %0 Conference Proceedings %B Proceedings of Department of Defense HPCMP Users Group Conference %D 1999 %T PAPI: A Portable Interface to Hardware Performance Counters %A Shirley Browne %A Christine Deane %A George Ho %A Phil Mucci %K papi %B Proceedings of Department of Defense HPCMP Users Group Conference %8 1999-06 %G eng %0 Journal Article %J D-Lib Magazine %D 1998 %T National HPCC Software Exchange (NHSE): Uniting the High Performance Computing and Communications Community %A Shirley Browne %A Jack Dongarra %A Jeff Horner %A Paul McMahan %A Scott Wells %K rib %B D-Lib Magazine %8 1998-01 %G eng