%0 Conference Proceedings %B IEEE Proceedings (to appear) %D 2004 %T Self Adapting Linear Algebra Algorithms and Software %A James Demmel %A Jack Dongarra %A Victor Eijkhout %A Erika Fuentes %A Antoine Petitet %A Rich Vuduc %A Clint Whaley %A Katherine Yelick %K salsa %K sans %B IEEE Proceedings (to appear) %8 2004-00 %G eng %0 Journal Article %J ACM Transactions on Mathematical Software %D 2002 %T An Updated Set of Basic Linear Algebra Subprograms (BLAS) %A Susan Blackford %A James Demmel %A Jack Dongarra %A Iain Duff %A Sven Hammarling %A Greg Henry %A Michael Heroux %A Linda Kaufman %A Andrew Lumsdaine %A Antoine Petitet %A Roldan Pozo %A Karin Remington %A Clint Whaley %B ACM Transactions on Mathematical Software %V 28 %P 135-151 %8 2002-12 %G eng %R 10.1145/567806.567807 %0 Journal Article %J Parallel Computing %D 2001 %T Automated Empirical Optimization of Software and the ATLAS Project %A Clint Whaley %A Antoine Petitet %A Jack Dongarra %K atlas %B Parallel Computing %V 27 %P 3-25 %8 2001-01 %G eng %0 Journal Article %J (an update), submitted to ACM TOMS %D 2001 %T Basic Linear Algebra Subprograms (BLAS) %A Susan Blackford %A James Demmel %A Jack Dongarra %A Iain Duff %A Sven Hammarling %A Greg Henry %A Michael Heroux %A Linda Kaufman %A Andrew Lumsdaine %A Antoine Petitet %A Roldan Pozo %A Karin Remington %A Clint Whaley %B (an update), submitted to ACM TOMS %8 2001-02 %G eng %0 Generic %D 2000 %T Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147) %A Clint Whaley %A Antoine Petitet %A Jack Dongarra %K atlas %B University of Tennessee Computer Science Department Technical Report, %8 2000-09 %G eng %0 Journal Article %J SIAM Annual Meeting %D 1999 %T A Numerical Linear Algebra Problem Solving Environment Designer's Perspective (LAPACK Working Note 139) %A Antoine Petitet %A Henri Casanova %A Clint Whaley %A Jack Dongarra %A Yves Robert %B SIAM Annual Meeting %C Atlanta, GA %8 1999-05 %G eng %0 Journal Article %J Handbook on Parallel and Distributed Processing %D 1999 %T Parallel and Distributed Scientific Computing: A Numerical Linear Algebra Problem Solving Environment Designer's Perspective %A Antoine Petitet %A Henri Casanova %A Jack Dongarra %A Yves Robert %A Clint Whaley %B Handbook on Parallel and Distributed Processing %8 1999-01 %G eng %0 Conference Paper %B 1998 ACM/IEEE conference on Supercomputing (SC '98) %D 1998 %T Automatically Tuned Linear Algebra Software %A Clint Whaley %A Jack Dongarra %K BLAS %K code generation %K high performance %K linear algebra %K optimization %K Tuning %X This paper describes an approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units. The production of such software for machines ranging from desktop workstations to embedded processors can be a tedious and time consuming process. The work described here can help in automating much of this process. We will concentrate our efforts on the widely used linear algebra kernels called the Basic Linear Algebra Subroutines (BLAS). In particular, the work presented here is for general matrix multiply, DGEMM. However much of the technology and approach developed here can be applied to the other Level 3 BLAS and the general strategy can have an impact on basic linear algebra operations in general and may be extended to other important kernel operations. %B 1998 ACM/IEEE conference on Supercomputing (SC '98) %I IEEE Computer Society %C Orlando, FL %8 1998-11 %@ 0-89791-984-X %G eng %0 Journal Article %J Computer Physics Communications %D 1996 %T ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance %A Jaeyoung Choi %A Jim Demmel %A Inderjit Dhillon %A Jack Dongarra %A Susan Ostrouchov %A Antoine Petitet %A Kendall Stanley %A David Walker %A Clint Whaley %X This paper outlines the content and performance of ScaLAPACK, a collection of mathematical software for linear algebra computations on distributed memory computers. The importance of developing standards for computational and message passing interfaces is discussed. We present the different components and building blocks of ScaLAPACK. This paper outlines the difficulties inherent in producing correct codes for networks of heterogeneous processors. We define a theoretical model of parallel computers dedicated to linear algebra applications: the Distributed Linear Algebra Machine (DLAM). This model provides a convenient framework for developing parallel algorithms and investigating their scalability, performance and programmability. Extensive performance results on various platforms are presented and analyzed with the help of the DLAM. Finally, this paper briefly describes future directions for the ScaLAPACK library and concludes by suggesting alternative approaches to mathematical libraries, explaining how ScaLAPACK could be integrated into efficient and user-friendly distributed systems. %B Computer Physics Communications %V 97 %P 1-15 %8 1996-08 %G eng %N 1-2 %R https://doi.org/10.1016/0010-4655(96)00017-3