%0 Generic
%D 2020
%T Prospectus for the Next LAPACK and ScaLAPACK Libraries: Basic ALgebra LIbraries for Sustainable Technology with Interdisciplinary Collaboration (BALLISTIC)
%A James Demmel
%A Jack Dongarra
%A Julie Langou
%A Julien Langou
%A Piotr Luszczek
%A Michael Mahoney
%X The convergence of several unprecedented changes, including formidable new system design constraints and revolutionary levels of heterogeneity, has made it clear that much of the essential software infrastructure of computational science and engineering is, or will soon be, obsolete. Math libraries have historically been in the vanguard of software that must be adapted first to such changes, both because these low-level workhorses are so critical to the accuracy and performance of so many different types of applications, and because they have proved to be outstanding vehicles for finding and implementing solutions to the problems that novel architectures pose. Under the Basic ALgebra LIbraries for Sustainable Technology with Interdisciplinary Collaboration (BALLISTIC) project, the principal designers of the Linear Algebra PACKage (LAPACK) and the Scalable Linear Algebra PACKage (ScaLAPACK), the combination of which is abbreviated Sca/LAPACK, aim to enhance and update these libraries for the ongoing revolution in processor architecture, system design, and application requirements by incorporating them into a layered package of software components—the BALLISTIC ecosystem—that provides users seamless access to state-of-the-art solver implementations through familiar and improved Sca/LAPACK interfaces.
%B LAPACK Working Notes
%I University of Tennessee
%8 2020/07
%G eng
%0 Generic
%D 2018
%T Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem
%A Osni Marques
%A James Demmel
%A Paulo B. Vasconcelos
%X In this paper, we present an algorithm for the singular value decomposition (SVD) of a bidiagonal matrix by means of the eigenpairs of an associated symmetric tridiagonal matrix. The algorithm is particularly suited for the computation of a subset of singular values and corresponding vectors. We focus on a sequential version of the algorithm, and discuss special cases and implementation details. We use a large set of bidiagonal matrices to assess the accuracy of the implementation in single and double precision, as well as to identify potential shortcomings. We show that the algorithm can be up to three orders of magnitude faster than existing algorithms, which are limited to the computation of a full SVD. We also show time comparisons of an implementation that uses the strategy discussed in the paper as a building block for the computation of the SVD of general matrices.
%B LAPACK Working Note
%I University of Tennessee
%8 2018-04
%G eng
%0 Journal Article
%J SIAM Journal on Matrix Analysis and Application
%D 2014
%T Communication-Avoiding Symmetric-Indefinite Factorization
%A Grey Ballard
%A Dulceneia Becker
%A James Demmel
%A Jack Dongarra
%A Alex Druinsky
%A I Peled
%A Oded Schwartz
%A Sivan Toledo
%A Ichitaro Yamazaki
%X We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen’s triangular tridiagonalization. It factors a dense symmetric matrix A as the product A = P LT L T P T where P is a permutation matrix, L is lower triangular, and T is block tridiagonal and banded. The algorithm is the first symmetric-indefinite communication-avoiding factorization: it performs an asymptotically optimal amount of communication in a two-level memory hierarchy for almost any cache-line size. Adaptations of the algorithm to parallel computers are likely to be communication efficient as well; one such adaptation has been recently published. The current paper describes the algorithm, proves that it is numerically stable, and proves that it is communication optimal.
%B SIAM Journal on Matrix Analysis and Application
%V 35
%P 1364-1406
%8 2014-07
%G eng
%N 4
%0 Journal Article
%J IPDPS 2013 (submitted)
%D 2013
%T Implementing a Blocked Aasen’s Algorithm with a Dynamic Scheduler on Multicore Architectures
%A Ichitaro Yamazaki
%A Dulceneia Becker
%A Jack Dongarra
%A Alex Druinsky
%A I. Peled
%A Sivan Toledo
%A Grey Ballard
%A James Demmel
%A Oded Schwartz
%X Factorization of a dense symmetric indeﬁnite matrix is a key computational kernel in many scientiﬁc and engineering simulations. However, there is no scalable factorization algorithm that takes advantage of the symmetry and guarantees numerical stability through pivoting at the same time. This is because such an algorithm exhibits many of the fundamental challenges in parallel programming like irregular data accesses and irregular task dependencies. In this paper, we address these challenges in a tiled implementation of a blocked Aasen’s algorithm using a dynamic scheduler. To fully exploit the limited parallelism in this left-looking algorithm, we study several performance enhancing techniques; e.g., parallel reduction to update a panel, tall-skinny LU factorization algorithms to factorize the panel, and a parallel implementation of symmetric pivoting. Our performance results on up to 48 AMD Opteron processors demonstrate that our implementation obtains speedups of up to 2.8 over MKL, while losing only one or two digits in the computed residual norms.
%B IPDPS 2013 (submitted)
%C Boston, MA
%8 2013-00
%G eng
%0 Book Section
%B Handbook of Linear Algebra
%D 2013
%T LAPACK
%A Zhaojun Bai
%A James Demmel
%A Jack Dongarra
%A Julien Langou
%A Jenny Wang
%X With a substantial amount of new material, the Handbook of Linear Algebra, Second Edition provides comprehensive coverage of linear algebra concepts, applications, and computational software packages in an easy-to-use format. It guides you from the very elementary aspects of the subject to the frontiers of current research. Along with revisions and updates throughout, the second edition of this bestseller includes 20 new chapters.
%B Handbook of Linear Algebra
%7 Second
%I CRC Press
%C Boca Raton, FL
%@ 9781466507289
%G eng
%0 Journal Article
%J SciDAC Review
%D 2009
%T Accelerating Time-To-Solution for Computational Science and Engineering
%A James Demmel
%A Jack Dongarra
%A Armando Fox
%A Sam Williams
%A Vasily Volkov
%A Katherine Yelick
%B SciDAC Review
%8 2009-00
%G eng
%0 Generic
%D 2009
%T Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects
%A Emmanuel Agullo
%A James Demmel
%A Jack Dongarra
%A Bilel Hadri
%A Jakub Kurzak
%A Julien Langou
%A Hatem Ltaeif
%A Piotr Luszczek
%A Rajib Nath
%A Stanimire Tomov
%A Asim YarKhan
%A Vasily Volkov
%I The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC09)
%C Portland, OR
%8 2009-11
%G eng
%0 Conference Proceedings
%B Journal of Physics: Conference Series
%D 2009
%T Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects
%A Emmanuel Agullo
%A James Demmel
%A Jack Dongarra
%A Bilel Hadri
%A Jakub Kurzak
%A Julien Langou
%A Hatem Ltaeif
%A Piotr Luszczek
%A Stanimire Tomov
%K magma
%K plasma
%B Journal of Physics: Conference Series
%V 180
%8 2009-00
%G eng
%0 Generic
%D 2008
%T Enhancing the Performance of Dense Linear Algebra Solvers on GPUs (in the MAGMA Project)
%A Marc Baboulin
%A James Demmel
%A Jack Dongarra
%A Stanimire Tomov
%A Vasily Volkov
%I The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC08)
%C Austin, TX
%8 2008-11
%G eng
%0 Generic
%D 2008
%T HPCS Library Study Effort
%A Jack Dongarra
%A James Demmel
%A Parry Husbands
%A Piotr Luszczek
%B University of Tennessee Computer Science Technical Report, UT-CS-08-617
%8 2008-01
%G eng
%0 Journal Article
%J PARA 2006
%D 2006
%T Prospectus for the Next LAPACK and ScaLAPACK Libraries
%A James Demmel
%A Jack Dongarra
%A B. Parlett
%A William Kahan
%A Ming Gu
%A David Bindel
%A Yozo Hida
%A Xiaoye Li
%A Osni Marques
%A Jason E. Riedy
%A Christof Voemel
%A Julien Langou
%A Piotr Luszczek
%A Jakub Kurzak
%A Alfredo Buttari
%A Julien Langou
%A Stanimire Tomov
%B PARA 2006
%C Umea, Sweden
%8 2006-06
%G eng
%0 Journal Article
%D 2005
%T LAPACK 2005 Prospectus: Reliable and Scalable Software for Linear Algebra Computations on High End Computers
%A James Demmel
%A Jack Dongarra
%I LAPACK Working Note 164
%8 2005-01
%G eng
%0 Conference Proceedings
%B IEEE Proceedings (to appear)
%D 2004
%T Self Adapting Linear Algebra Algorithms and Software
%A James Demmel
%A Jack Dongarra
%A Victor Eijkhout
%A Erika Fuentes
%A Antoine Petitet
%A Rich Vuduc
%A Clint Whaley
%A Katherine Yelick
%K salsa
%K sans
%B IEEE Proceedings (to appear)
%8 2004-00
%G eng
%0 Journal Article
%J ACM Transactions on Mathematical Software
%D 2002
%T An Updated Set of Basic Linear Algebra Subprograms (BLAS)
%A Susan Blackford
%A James Demmel
%A Jack Dongarra
%A Iain Duff
%A Sven Hammarling
%A Greg Henry
%A Michael Heroux
%A Linda Kaufman
%A Andrew Lumsdaine
%A Antoine Petitet
%A Roldan Pozo
%A Karin Remington
%A Clint Whaley
%B ACM Transactions on Mathematical Software
%V 28
%P 135-151
%8 2002-12
%G eng
%R 10.1145/567806.567807
%0 Journal Article
%J (an update), submitted to ACM TOMS
%D 2001
%T Basic Linear Algebra Subprograms (BLAS)
%A Susan Blackford
%A James Demmel
%A Jack Dongarra
%A Iain Duff
%A Sven Hammarling
%A Greg Henry
%A Michael Heroux
%A Linda Kaufman
%A Andrew Lumsdaine
%A Antoine Petitet
%A Roldan Pozo
%A Karin Remington
%A Clint Whaley
%B (an update), submitted to ACM TOMS
%8 2001-02
%G eng
%0 Journal Article
%J Philadelphia: Society for Industrial and Applied Mathematics
%D 1999
%T LAPACK Users' Guide, 3rd ed.
%A Ed Anderson
%A Zhaojun Bai
%A Christian Bischof
%A Susan Blackford
%A James Demmel
%A Jack Dongarra
%A Jeremy Du Croz
%A Anne Greenbaum
%A Sven Hammarling
%A Alan McKenney
%A Danny Sorensen
%B Philadelphia: Society for Industrial and Applied Mathematics
%8 1999-01
%G eng