Towards Achieving Performance Portability Using Directives for Accelerators

TitleTowards Achieving Performance Portability Using Directives for Accelerators
Publication TypeConference Paper
Year of Publication2016
AuthorsLopez, M., V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra
Conference NameThe International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD)
Date Published11-2016
PublisherInnovative Computing Laboratory, University of Tennessee
Conference LocationSalt Lake City, Utah
AbstractIn this paper we explore the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architectures with attached accelerators, both self-hosted multicore and offload multicore/GPU. Our goal is to examine how successful OpenACC and the newer of- fload features of OpenMP 4.5 are for moving codes between architectures, how much tuning might be required and what lessons we can learn from this experience. To do this, we use examples of algorithms with varying computational intensities for our evaluation, as both compute and data access efficiency are important considerations for overall application performance. We implement these kernels using various methods provided by newer OpenACC and OpenMP implementations, and we evaluate their performance on various platforms including both X86 64 with attached NVIDIA GPUs, self-hosted Intel Xeon Phi KNL, as well as an X86 64 host system with Intel Xeon Phi coprocessors. In this paper, we explain what factors affected the performance portability such as how to pick the right programming model, its programming style, its availability on different platforms, and how well compilers can optimize and target to multiple platforms.
Project Tags: