%0 Journal Article %J Parallel Computing (to appear) %D 2010 %T A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures %A Alfredo Buttari %A Julien Langou %A Jakub Kurzak %A Jack Dongarra %B Parallel Computing (to appear) %8 2010-00 %G eng %0 Journal Article %J Computer Physics Communications %D 2009 %T Accelerating Scientific Computations with Mixed Precision Algorithms %A Marc Baboulin %A Alfredo Buttari %A Jack Dongarra %A Jakub Kurzak %A Julie Langou %A Julien Langou %A Piotr Luszczek %A Stanimire Tomov %X On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. %B Computer Physics Communications %V 180 %P 2526-2533 %8 2009-12 %G eng %N 12 %R https://doi.org/10.1016/j.cpc.2008.11.005 %0 Journal Article %J Parallel Computing %D 2009 %T A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures %A Alfredo Buttari %A Julien Langou %A Jakub Kurzak %A Jack Dongarra %K plasma %B Parallel Computing %V 35 %P 38-53 %8 2009-00 %G eng %0 Journal Article %J in Cyberinfrastructure Technologies and Applications %D 2009 %T Parallel Dense Linear Algebra Software in the Multicore Era %A Alfredo Buttari %A Jack Dongarra %A Jakub Kurzak %A Julien Langou %E Junwei Cao %K plasma %B in Cyberinfrastructure Technologies and Applications %I Nova Science Publishers, Inc. %P 9-24 %8 2009-00 %G eng %0 Journal Article %J in High Performance Computing and Grids in Action %D 2008 %T Exploiting Mixed Precision Floating Point Hardware in Scientific Computations %A Alfredo Buttari %A Jack Dongarra %A Jakub Kurzak %A Julien Langou %A Julien Langou %A Piotr Luszczek %A Stanimire Tomov %E Lucio Grandinetti %B in High Performance Computing and Grids in Action %I IOS Press %C Amsterdam %8 2008-01 %G eng %0 Journal Article %J Concurrency and Computation: Practice and Experience %D 2008 %T Parallel Tiled QR Factorization for Multicore Architectures %A Alfredo Buttari %A Julien Langou %A Jakub Kurzak %A Jack Dongarra %B Concurrency and Computation: Practice and Experience %V 20 %P 1573-1590 %8 2008-01 %G eng %0 Journal Article %J Computing in Science and Engineering %D 2008 %T The PlayStation 3 for High Performance Scientific Computing %A Jakub Kurzak %A Alfredo Buttari %A Piotr Luszczek %A Jack Dongarra %B Computing in Science and Engineering %P 80-83 %8 2008-01 %G eng %0 Generic %D 2008 %T The PlayStation 3 for High Performance Scientific Computing %A Jakub Kurzak %A Alfredo Buttari %A Piotr Luszczek %A Jack Dongarra %B University of Tennessee Computer Science Technical Report %8 2008-01 %G eng %0 Journal Article %J IEEE Transactions on Parallel and Distributed Systems %D 2008 %T Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization %A Jakub Kurzak %A Alfredo Buttari %A Jack Dongarra %B IEEE Transactions on Parallel and Distributed Systems %V 19 %P 1-11 %8 2008-01 %G eng %0 Journal Article %J ACM Transactions on Mathematical Software %D 2008 %T Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy %A Alfredo Buttari %A Jack Dongarra %A Jakub Kurzak %A Piotr Luszczek %A Stanimire Tomov %K plasma %B ACM Transactions on Mathematical Software %V 34 %P 17-22 %8 2008-00 %G eng %0 Generic %D 2007 %T A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures %A Alfredo Buttari %A Julien Langou %A Jakub Kurzak %A Jack Dongarra %K plasma %B University of Tennessee Computer Science Technical Report %8 2007-01 %G eng %0 Journal Article %J In High Performance Computing and Grids in Action (to appear) %D 2007 %T Exploiting Mixed Precision Floating Point Hardware in Scientific Computations %A Alfredo Buttari %A Jack Dongarra %A Jakub Kurzak %A Julien Langou %A Julie Langou %A Piotr Luszczek %A Stanimire Tomov %E Lucio Grandinetti %B In High Performance Computing and Grids in Action (to appear) %I IOS Press %C Amsterdam %8 2007-00 %G eng %0 Generic %D 2007 %T Limitations of the Playstation 3 for High Performance Cluster Computing %A Alfredo Buttari %A Jack Dongarra %A Jakub Kurzak %B University of Tennessee Computer Science Technical Report, UT-CS-07-597 (Also LAPACK Working Note 185) %8 2007-00 %G eng %0 Journal Article %J International Journal of High Performance Computer Applications (to appear) %D 2007 %T Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems %A Alfredo Buttari %A Jack Dongarra %A Julien Langou %A Julie Langou %A Piotr Luszczek %A Jakub Kurzak %B International Journal of High Performance Computer Applications (to appear) %8 2007-08 %G eng %0 Conference Proceedings %B Journal of Physics: Conference Series, SciDAC 2007 %D 2007 %T Multithreading for synchronization tolerance in matrix factorization %A Alfredo Buttari %A Jack Dongarra %A Parry Husbands %A Jakub Kurzak %A Katherine Yelick %B Journal of Physics: Conference Series, SciDAC 2007 %V 78 %8 2007-01 %G eng %0 Generic %D 2007 %T Parallel Tiled QR Factorization for Multicore Architectures %A Alfredo Buttari %A Julien Langou %A Jakub Kurzak %A Jack Dongarra %K plasma %B University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-598 (also LAPACK Working Note 190) %8 2007-00 %G eng %0 Generic %D 2007 %T SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3 %A Alfredo Buttari %A Piotr Luszczek %A Jakub Kurzak %A Jack Dongarra %A George Bosilca %K multi-core %B University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-595 %8 2007-00 %G eng %0 Generic %D 2007 %T Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization %A Jakub Kurzak %A Alfredo Buttari %A Jack Dongarra %K lapack %B UT Computer Science Technical Report (Also LAPACK Working Note 184) %8 2007-01 %G eng %0 Journal Article %J University of Tennessee Computer Science Tech Report %D 2006 %T Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy %A Julien Langou %A Julien Langou %A Piotr Luszczek %A Jakub Kurzak %A Alfredo Buttari %A Jack Dongarra %K iter-ref %B University of Tennessee Computer Science Tech Report %8 2006-04 %G eng %0 Journal Article %J PARA 2006 %D 2006 %T The Impact of Multicore on Math Software %A Alfredo Buttari %A Jack Dongarra %A Jakub Kurzak %A Julien Langou %A Piotr Luszczek %A Stanimire Tomov %K plasma %B PARA 2006 %C Umea, Sweden %8 2006-06 %G eng %0 Journal Article %J PARA 2006 %D 2006 %T Prospectus for the Next LAPACK and ScaLAPACK Libraries %A James Demmel %A Jack Dongarra %A B. Parlett %A William Kahan %A Ming Gu %A David Bindel %A Yozo Hida %A Xiaoye Li %A Osni Marques %A Jason E. Riedy %A Christof Voemel %A Julien Langou %A Piotr Luszczek %A Jakub Kurzak %A Alfredo Buttari %A Julien Langou %A Stanimire Tomov %B PARA 2006 %C Umea, Sweden %8 2006-06 %G eng %0 Generic %D 2004 %T Performance Optimization and Modeling of Blocked Sparse Kernels %A Alfredo Buttari %A Victor Eijkhout %A Julien Langou %A Salvatore Filippone %K sans %B ICL Technical Report %8 2004-00 %G eng