%0 Conference Paper
%B 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS)
%D 2015
%T A Data Flow Divide and Conquer Algorithm for Multicore Architecture
%A Azzam Haidar
%A Jakub Kurzak
%A Gregoire Pichon
%A Mathieu Faverge
%K Eigensolver
%K lapack
%K Multicore
%K plasma
%K task-based programming
%X Computing eigenpairs of a symmetric matrix is a problem arising in many industrial applications, including quantum physics and finite-elements computation for automobiles. A classical approach is to reduce the matrix to tridiagonal form before computing eigenpairs of the tridiagonal matrix. Then, a back-transformation allows one to obtain the final solution. Parallelism issues of the reduction stage have already been tackled in different shared-memory libraries. In this article, we focus on solving the tridiagonal eigenproblem, and we describe a novel implementation of the Divide and Conquer algorithm. The algorithm is expressed as a sequential task-flow, scheduled in an out-of-order fashion by a dynamic runtime which allows the programmer to play with tasks granularity. The resulting implementation is between two and five times faster than the equivalent routine from the Intel MKL library, and outperforms the best MRRR implementation for many matrices.
%B 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS)
%I IEEE
%C Hyderabad, India
%8 2015-05
%G eng
%0 Journal Article
%J International Journal of High Performance Computing Applications
%D 2014
%T A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks
%A Azzam Haidar
%A Raffaele Solcà
%A Mark Gates
%A Stanimire Tomov
%A Thomas C. Schulthess
%A Jack Dongarra
%K Eigensolver
%K electronic structure calculations
%K generalized eigensolver
%K gpu
%K high performance
%K hybrid
%K Multicore
%K two-stage
%X The adoption of hybrid CPU–GPU nodes in traditional supercomputing platforms such as the Cray-XK6 opens acceleration opportunities for electronic structure calculations in materials science and chemistry applications, where medium-sized generalized eigenvalue problems must be solved many times. These eigenvalue problems are too small to effectively solve on distributed systems, but can benefit from the massive computing power concentrated on a single-node, hybrid CPU–GPU system. However, hybrid systems call for the development of new algorithms that efficiently exploit heterogeneity and massive parallelism of not just GPUs, but of multicore/manycore CPUs as well. Addressing these demands, we developed a generalized eigensolver featuring novel algorithms of increased computational intensity (compared with the standard algorithms), decomposition of the computation into fine-grained memory aware tasks, and their hybrid execution. The resulting eigensolvers are state-of-the-art in high-performance computing, significantly outperforming existing libraries. We describe the algorithm and analyze its performance impact on applications of interest when different fractions of eigenvectors are needed by the host electronic structure code.
%B International Journal of High Performance Computing Applications
%V 28
%P 196-209
%8 2014-05
%G eng
%N 2
%& 196
%R 10.1177/1094342013502097