%0 Generic
%D 2017
%T The Case for Directive Programming for Accelerator Autotuner Optimization
%A Diana Fayad
%A Jakub Kurzak
%A Piotr Luszczek
%A Panruo Wu
%A Jack Dongarra
%X In this work, we present the use of compiler pragma directives for parallelizing autotuning of specialized compute kernels for hardware accelerators. A set of constructs, that include prallelizing a source code that prune a generated search space with a large number of constraints for an autotunning infrastructure. For a better performance we studied optimization aimed at minimization of the run time.We also studied the behavior of the parallel load balance and the speedup on four different machines: x86, Xeon Phi, ARMv8, and POWER8.
%B Innovative Computing Laboratory Technical Report
%I University of Tennessee
%8 10-2017
%G eng
%0 Conference Paper
%B 2017 IEEE High Performance Extreme Computing Conference (HPEC'17)
%D 2017
%T Out of Memory SVD Solver for Big Data
%A Azzam Haidar
%A Khairul Kabir
%A Diana Fayad
%A Stanimire Tomov
%A Jack Dongarra
%X Many applications – from data compression to numerical weather prediction and information retrieval – need to compute large dense singular value decompositions (SVD). When the problems are too large to fit into the computer’s main memory, specialized out-of-core algorithms that use disk storage are required. A typical example is when trying to analyze a large data set through tools like MATLAB or Octave, but the data is just too large to be loaded. To overcome this, we designed a class of out-of-memory (OOM) algorithms to reduce, as well as overlap communication with computation. Of particular interest is OOM algorithms for matrices of size m×n, where m >> n or m << n, e.g., corresponding to cases of too many variables, or too many observations. To design OOM SVDs, we first study the communications cost for the SVD techniques as well as for the QR/LQ factorization followed by SVD. We present the theoretical analysis about the data movement cost and strategies to design OOM SVD algorithms. We show performance results for multicore architecture that illustrate our theoretical findings and match our performance models. Moreover, our experimental results show the feasibility and superiority of the OOM SVD.
%B 2017 IEEE High Performance Extreme Computing Conference (HPEC'17)
%I IEEE
%C Waltham, MA
%8 09-2017
%G eng