%0 Journal Article
%J ACM Transactions on Mathematical Software (TOMS)
%D 2013
%T Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms
%A Fred G. Gustavson
%A Jerzy Wasniewski
%A Jack Dongarra
%A José Herrero
%A Julien Langou
%X Four routines called DPOTF3i, i = a,b,c,d, are presented. DPOTF3i are a novel type of level-3 BLAS for use by BPF (Blocked Packed Format) Cholesky factorization and LAPACK routine DPOTRF. Performance of routines DPOTF3i are still increasing when the performance of Level-2 routine DPOTF2 of LAPACK starts decreasing. This is our main result and it implies, due to the use of larger block size nb, that DGEMM, DSYRK, and DTRSM performance also increases! The four DPOTF3i routines use simple register blocking. Different platforms have different numbers of registers. Thus, our four routines have different register blocking sizes. BPF is introduced. LAPACK routines for POTRF and PPTRF using BPF instead of full and packed format are shown to be trivial modifications of LAPACK POTRF source codes. We call these codes BPTRF. There are two variants of BPF: lower and upper. Upper BPF is “identical” to Square Block Packed Format (SBPF). “LAPACK” implementations on multicore processors use SBPF. Lower BPF is less efficient than upper BPF. Vector inplace transposition converts lower BPF to upper BPF very efficiently. Corroborating performance results for DPOTF3i versus DPOTF2 on a variety of common platforms are given for n ≈ nb as well as results for large n comparing DBPTRF versus DPOTRF.
%B ACM Transactions on Mathematical Software (TOMS)
%V 39
%8 2013-02
%G eng
%N 2
%R 10.1145/2427023.2427026
%0 Journal Article
%J ACM TOMS (submitted), also LAPACK Working Note (LAWN) 211
%D 2010
%T Level-3 Cholesky Kernel Subroutine of a Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm
%A Fred G. Gustavson
%A Jerzy Wasniewski
%A Jack Dongarra
%B ACM TOMS (submitted), also LAPACK Working Note (LAWN) 211
%8 2010-00
%G eng
%0 Journal Article
%J ACM Transactions on Mathematical Software (TOMS)
%D 2010
%T Rectangular Full Packed Format for Cholesky’s Algorithm: Factorization, Solution, and Inversion
%A Fred G. Gustavson
%A Jerzy Wasniewski
%A Jack Dongarra
%A Julien Langou
%B ACM Transactions on Mathematical Software (TOMS)
%C Atlanta, GA
%V 37
%8 2010-04
%G eng
%0 Journal Article
%J ACM Transactions on Mathematical Software (TOMS)
%D 2010
%T Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion
%A Fred G. Gustavson
%A Jerzy Wasniewski
%A Jack Dongarra
%A Julien Langou
%B ACM Transactions on Mathematical Software (TOMS)
%V 37
%8 2010-04
%G eng
%0 Journal Article
%J ACM TOMS (to appear)
%D 2009
%T Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion
%A Fred G. Gustavson
%A Jerzy Wasniewski
%A Jack Dongarra
%A Julien Langou
%B ACM TOMS (to appear)
%8 2009-00
%G eng
%0 Generic
%D 2008
%T Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion
%A Fred G. Gustavson
%A Jerzy Wasniewski
%A Jack Dongarra
%B University of Tennessee Computer Science Technical Report, UT-CS-08-614 (also LAPACK Working Note 199)
%8 2008-04
%G eng