%0 Conference Paper %B 2014 IEEE International Conference on Cluster Computing %D 2014 %T Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models %A Heike McCraw %A James Ralph %A Anthony Danalis %A Jack Dongarra %X For more than a decade, the PAPI performance-monitoring library has provided a clear, portable interface to the hardware performance counters available on all modern CPUs and other components of interest (e.g., GPUs, network, and I/O systems). Most major end-user tools that application developers use to analyze the performance of their applications rely on PAPI to gain access to these performance counters. One of the critical road-blockers on the way to larger, more complex high performance systems, has been widely identified as being the energy efficiency constraints. With modern extreme scale machines having hundreds of thousands of cores, the ability to reduce power consumption for each CPU at the software level becomes critically important, both for economic and environmental reasons. In order for PAPI to continue playing its well established role in HPC, it is pressing to provide valuable performance data that not only originates from within the processing cores but also delivers insight into the power consumption of the system as a whole. An extensive effort has been made to extend the Performance API to support power monitoring capabilities for various platforms. This paper provides detailed information about three components that allow power monitoring on the Intel Xeon Phi and Blue Gene/Q. Furthermore, we discuss the integration of PAPI in PARSEC – a taskbased dataflow-driven execution engine – enabling hardware performance counter and power monitoring at true task granularity. %B 2014 IEEE International Conference on Cluster Computing %I IEEE %C Madrid, Spain %8 2014-09 %G eng %R 10.1109/CLUSTER.2014.6968672 %0 Conference Paper %B 2014 IEEE International Conference on Cluster Computing %D 2014 %T Utilizing Dataflow-based Execution for Coupled Cluster Methods %A Heike McCraw %A Anthony Danalis %A George Bosilca %A Jack Dongarra %A Karol Kowalski %A Theresa Windus %X Computational chemistry comprises one of the driving forces of High Performance Computing. In particular, many-body methods, such as Coupled Cluster (CC) methods of the quantum chemistry package NWCHEM, are of particular interest for the applied chemistry community. Harnessing large fractions of the processing power of modern large scale computing platforms has become increasingly difficult. With the increase in scale, complexity, and heterogeneity of modern platforms, traditional programming models fail to deliver the expected performance scalability. On our way to Exascale and with these extremely hybrid platforms, dataflow-based programming models may be the only viable way for achieving and maintaining computation at scale. In this paper, we discuss a dataflow-based programming model and its applicability to NWCHEM’s CC methods. Our dataflow version of the CC kernels breaks down the algorithm into fine-grained tasks with explicitly defined data dependencies. As a result, many of the traditional synchronization points can be eliminated, allowing for a dynamic reshaping of the execution based on the ongoing availability of computational resources. We build this experiment using PARSEC – a task-based dataflow-driven execution engine – that enables efficient task scheduling on distributed systems, providing a desirable portability layer for application developers. %B 2014 IEEE International Conference on Cluster Computing %I IEEE %C Madrid, Spain %8 2014-09 %G eng %0 Conference Paper %B International Supercomputing Conference 2013 (ISC'13) %D 2013 %T Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q %A Heike McCraw %A Dan Terpstra %A Jack Dongarra %A Kris Davis %A Roy Musselman %B International Supercomputing Conference 2013 (ISC'13) %I Springer %C Leipzig, Germany %8 2013-06 %G eng %0 Generic %D 2013 %T PAPI 5: Measuring Power, Energy, and the Cloud %A Vincent Weaver %A Dan Terpstra %A Heike McCraw %A Matt Johnson %A Kiran Kasichayanula %A James Ralph %A John Nelson %A Phil Mucci %A Tushar Mohan %A Shirley Moore %I 2013 IEEE International Symposium on Performance Analysis of Systems and Software %C Austin, TX %8 2013-04 %G eng %0 Journal Article %J CloudTech-HPC 2012 %D 2012 %T PAPI-V: Performance Monitoring for Virtual Machines %A Matt Johnson %A Heike McCraw %A Shirley Moore %A Phil Mucci %A John Nelson %A Dan Terpstra %A Vincent M Weaver %A Tushar Mohan %K papi %X This paper describes extensions to the PAPI hardware counter library for virtual environments, called PAPI-V. The extensions support timing routines, I/O measurements, and processor counters. The PAPI-V extensions will allow application and tool developers to use a familiar interface to obtain relevant hardware performance monitoring information in virtual environments. %B CloudTech-HPC 2012 %C Pittsburgh, PA %8 2012-09 %G eng %R 10.1109/ICPPW.2012.29 %0 Generic %D 2012 %T Performance Counter Monitoring for the Blue Gene/Q Architecture %A Heike McCraw %K papi %B University of Tennessee Computer Science Technical Report %8 2012-00 %G eng