%0 Journal Article
%J SIAM Journal on Matrix Analysis and Application
%D 2014
%T Communication-Avoiding Symmetric-Indefinite Factorization
%A Grey Ballard
%A Dulceneia Becker
%A James Demmel
%A Jack Dongarra
%A Alex Druinsky
%A I Peled
%A Oded Schwartz
%A Sivan Toledo
%A Ichitaro Yamazaki
%X We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen’s triangular tridiagonalization. It factors a dense symmetric matrix A as the product A = P LT L T P T where P is a permutation matrix, L is lower triangular, and T is block tridiagonal and banded. The algorithm is the first symmetric-indefinite communication-avoiding factorization: it performs an asymptotically optimal amount of communication in a two-level memory hierarchy for almost any cache-line size. Adaptations of the algorithm to parallel computers are likely to be communication efficient as well; one such adaptation has been recently published. The current paper describes the algorithm, proves that it is numerically stable, and proves that it is communication optimal.
%B SIAM Journal on Matrix Analysis and Application
%V 35
%P 1364-1406
%8 2014-07
%G eng
%N 4
%0 Journal Article
%J IPDPS 2013 (submitted)
%D 2013
%T Implementing a Blocked Aasen’s Algorithm with a Dynamic Scheduler on Multicore Architectures
%A Ichitaro Yamazaki
%A Dulceneia Becker
%A Jack Dongarra
%A Alex Druinsky
%A I. Peled
%A Sivan Toledo
%A Grey Ballard
%A James Demmel
%A Oded Schwartz
%X Factorization of a dense symmetric indeﬁnite matrix is a key computational kernel in many scientiﬁc and engineering simulations. However, there is no scalable factorization algorithm that takes advantage of the symmetry and guarantees numerical stability through pivoting at the same time. This is because such an algorithm exhibits many of the fundamental challenges in parallel programming like irregular data accesses and irregular task dependencies. In this paper, we address these challenges in a tiled implementation of a blocked Aasen’s algorithm using a dynamic scheduler. To fully exploit the limited parallelism in this left-looking algorithm, we study several performance enhancing techniques; e.g., parallel reduction to update a panel, tall-skinny LU factorization algorithms to factorize the panel, and a parallel implementation of symmetric pivoting. Our performance results on up to 48 AMD Opteron processors demonstrate that our implementation obtains speedups of up to 2.8 over MKL, while losing only one or two digits in the computed residual norms.
%B IPDPS 2013 (submitted)
%C Boston, MA
%8 2013-00
%G eng
%0 Generic
%D 2012
%T An efficient distributed randomized solver with application to large dense linear systems
%A Marc Baboulin
%A Dulceneia Becker
%A George Bosilca
%A Anthony Danalis
%A Jack Dongarra
%K dague
%K dplasma
%K parsec
%B ICL Technical Report
%8 2012-07
%G eng
%0 Journal Article
%J IPDPS 2012
%D 2012
%T A Parallel Tiled Solver for Symmetric Indefinite Systems On Multicore Architectures
%A Marc Baboulin
%A Dulceneia Becker
%A Jack Dongarra
%B IPDPS 2012
%C Shanghai, China
%8 2012-05
%G eng
%0 Journal Article
%J Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science (PPAM 2011)
%D 2012
%T Reducing the Amount of Pivoting in Symmetric Indefinite Systems
%A Dulceneia Becker
%A Marc Baboulin
%A Jack Dongarra
%E Roman Wyrzykowski
%E Jack Dongarra
%E Konrad Karczewski
%E Jerzy Wasniewski
%B Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science (PPAM 2011)
%I Springer-Verlag Berlin Heidelberg
%V 7203
%P 133-142
%8 2012-00
%G eng
%0 Generic
%D 2011
%T A parallel tiled solver for dense symmetric indefinite systems on multicore architectures
%A Marc Baboulin
%A Dulceneia Becker
%A Jack Dongarra
%K plasma
%K quark
%B University of Tennessee Computer Science Technical Report
%8 2011-10
%G eng
%0 Generic
%D 2011
%T Reducing the Amount of Pivoting in Symmetric Indefinite Systems
%A Dulceneia Becker
%A Marc Baboulin
%A Jack Dongarra
%B University of Tennessee Innovative Computing Laboratory Technical Report
%I Submitted to PPAM 2011
%C Knoxville, TN
%8 2011-05
%G eng
%0 Generic
%D 2011
%T Towards a Parallel Tile LDL Factorization for Multicore Architectures
%A Dulceneia Becker
%A Mathieu Faverge
%A Jack Dongarra
%K plasma
%K quark
%B ICL Technical Report
%C Seattle, WA
%8 2011-04
%G eng