%0 Journal Article %J International Journal of High Performance Computing and Networking %D 2019 %T Evaluation of Directive-Based Performance Portable Programming Models %A M. Graham Lopez %A Wayne Joubert %A Verónica Larrea %A Oscar Hernandez %A Azzam Haidar %A Stanimire Tomov %A Jack Dongarra %K OpenACC %K OpenMP 4 %K performance portability %K Programming models %X We present an extended exploration of the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architecture with attached accelerators, both self-hosted multicore and offload multicore/GPU. Our goal is to examine how successful OpenACC and the newer offload features of OpenMP 4.5 are for moving codes between architectures, and we document how much tuning might be required and what lessons we can learn from these experiences. To do this, we use examples of algorithms with varying computational intensities for our evaluation, as both compute and data access efficiency are important considerations for overall application performance. To better understand fundamental compute vs. bandwidth bound characteristics, we add the compute-bound Level 3 BLAS GEMM kernel to our linear algebra evaluation. We implement the kernels of interest using various methods provided by newer OpenACC and OpenMP implementations, and we evaluate their performance on various platforms including both x86_64 and Power8 with attached NVIDIA GPUs, x86_64 multicores, self-hosted Intel Xeon Phi KNL, as well as an x86_64 host system with Intel Xeon Phi coprocessors. We update these evaluations with the newest version of the NVIDIA Pascal architecture (P100), Intel KNL 7230, Power8+, and the newest supporting compiler implementations. Furthermore, we present in detail what factors affected the performance portability, including how to pick the right programming model, its programming style, its availability on different platforms, and how well compilers can optimise and target multiple platforms. %B International Journal of High Performance Computing and Networking %V 14 %P 165-182 %8 2019–07 %G eng %N 2 %R http://dx.doi.org/10.1504/IJHPCN.2017.10009064 %0 Generic %D 2017 %T POMPEI: Programming with OpenMP4 for Exascale Investigations %A Jack Dongarra %A Azzam Haidar %A Oscar Hernandez %A Stanimire Tomov %A Manjunath Gorentla Venkata %X The objective of the Programming with OpenMP4 for Exascale Investigations (POMPEI) project is to explore new task-based programming techniques together with data structure centric programming for scientific applications to harness the potential of extreme-scale systems. Tasking is a well established by now approach on such systems as it has been used successfully to handle their large-scale parallelism and heterogeneity, which are leading challenges on the way to exascale computing. The approach is to harness the latest features of OpenMP4.5 and OpenACC2.5 to design abstractions shared among tasks and mapped efficiently to data-structure driven programming paradigms. This technical report describes the approach, along with its reference implementation and results for dense linear algebra algorithms. %B Innovative Computing Laboratory Technical Report %I University of Tennessee %8 2017-12 %G eng %0 Journal Article %J Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming %D 2008 %T Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications %A Oscar Hernandez %A Fengguang Song %A Barbara Chapman %A Jack Dongarra %A Bernd Mohr %A Shirley Moore %A Felix Wolf %B Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming %I Springer Berlin / Heidelberg %V 4315 %8 2008-00 %G eng %0 Conference Proceedings %B Second International Workshop on OpenMP %D 2006 %T Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications %A Oscar Hernandez %A Fengguang Song %A Barbara Chapman %A Jack Dongarra %A Bernd Mohr %A Shirley Moore %A Felix Wolf %K kojak %B Second International Workshop on OpenMP %C Reims, France %8 2006-01 %G eng