%0 Journal Article %J CTWatch Quarterly %D 2007 %T The Impact of Multicore on Computational Science Software %A Jack Dongarra %A Dennis Gannon %A Geoffrey Fox %A Ken Kennedy %B CTWatch Quarterly %V 3 %8 2007-02 %G eng %N 1 %0 Journal Article %J International Journal of Parallel Programming %D 2005 %T New Grid Scheduling and Rescheduling Methods in the GrADS Project %A Francine Berman %A Henri Casanova %A Andrew Chien %A Keith Cooper %A Holly Dail %A Anshuman Dasgupta %A Wei Deng %A Jack Dongarra %A Lennart Johnsson %A Ken Kennedy %A Charles Koelbel %A Bo Liu %A Xu Liu %A Anirban Mandal %A Gabriel Marin %A Mark Mazina %A John Mellor-Crummey %A Celso Mendes %A A. Olugbile %A Jignesh M. Patel %A Dan Reed %A Zhiao Shi %A Otto Sievert %A H. Xia %A Asim YarKhan %K grads %B International Journal of Parallel Programming %I Springer %V 33 %P 209-229 %8 2005-06 %G eng %0 Conference Paper %B 2nd ACM SIGPLAN Workshop on Memory System Performance (MSP 2004) %D 2004 %T Automatic Blocking of QR and LU Factorizations for Locality %A Qing Yi %A Ken Kennedy %A Haihang You %A Keith Seymour %A Jack Dongarra %K gco %K papi %K sans %X QR and LU factorizations for dense matrices are important linear algebra computations that are widely used in scientific applications. To efficiently perform these computations on modern computers, the factorization algorithms need to be blocked when operating on large matrices to effectively exploit the deep cache hierarchy prevalent in today's computer memory systems. Because both QR (based on Householder transformations) and LU factorization algorithms contain complex loop structures, few compilers can fully automate the blocking of these algorithms. Though linear algebra libraries such as LAPACK provides manually blocked implementations of these algorithms, by automatically generating blocked versions of the computations, more benefit can be gained such as automatic adaptation of different blocking strategies. This paper demonstrates how to apply an aggressive loop transformation technique, dependence hoisting, to produce efficient blockings for both QR and LU with partial pivoting. We present different blocking strategies that can be generated by our optimizer and compare the performance of auto-blocked versions with manually tuned versions in LAPACK, both using reference BLAS, ATLAS BLAS and native BLAS specially tuned for the underlying machine architectures. %B 2nd ACM SIGPLAN Workshop on Memory System Performance (MSP 2004) %I ACM %C Washington, DC %8 2004-06 %G eng %R 10.1145/1065895.1065898 %0 Conference Proceedings %B International Parallel and Distributed Processing Symposium: IPDPS 2002 Workshops %D 2002 %T Toward a Framework for Preparing and Executing Adaptive Grid Programs %A Ken Kennedy %A John Mellor-Crummey %A Keith Cooper %A Linda Torczon %A Francine Berman %A Andrew Chien %A Dave Angulo %A Ian Foster %A Dennis Gannon %A Lennart Johnsson %A Carl Kesselman %A Jack Dongarra %A Sathish Vadhiyar %K grads %B International Parallel and Distributed Processing Symposium: IPDPS 2002 Workshops %C Fort Lauderdale, FL %P 0171 %8 2002-04 %G eng %0 Journal Article %J International Journal of High Performance Applications and Supercomputing %D 2001 %T The GrADS Project: Software Support for High-Level Grid Application Development %A Francine Berman %A Andrew Chien %A Keith Cooper %A Jack Dongarra %A Ian Foster %A Dennis Gannon %A Lennart Johnsson %A Ken Kennedy %A Carl Kesselman %A John Mellor-Crummey %A Dan Reed %A Linda Torczon %A Rich Wolski %K grads %B International Journal of High Performance Applications and Supercomputing %V 15 %P 327-344 %8 2001-01 %G eng %0 Journal Article %J Journal of Parallel and Distributed Computing %D 2001 %T Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries %A Ken Kennedy %A Bradley Broom %A Keith Cooper %A Jack Dongarra %A Rob Fowler %A Dennis Gannon %A Lennart Johnsson %A John Mellor-Crummey %A Linda Torczon %B Journal of Parallel and Distributed Computing %V 61 %P 1803-1826 %8 2001-12 %G eng %0 Generic %D 2000 %T The GrADS Project: Software Support for High-Level Grid Application Development %A Francine Berman %A Andrew Chien %A Keith Cooper %A Jack Dongarra %A Ian Foster %A Dennis Gannon %A Lennart Johnsson %A Ken Kennedy %A Carl Kesselman %A Dan Reed %A Linda Torczon %A Rich Wolski %K grads %B Technical Report %8 2000-02 %G eng