MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.
MAGMA: A Breakthrough in Solvers for Eigenvalue Problems , San Jose, CA, GPU Technology Conference (GTC12), Presentation, May 2012.
LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi,” IEEE High Performance Extreme Computing Conference (HPEC'16), Waltham, MA, IEEE, September 2016.“
LU Factorization with Partial Pivoting for a Multicore System with Accelerators,” IEEE Transactions on Parallel and Distributed Computing, vol. 24, issue 8, pp. 1613-1621, August 2013. DOI: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.242“
LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU,” 16th IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.“
LU Factorization for Accelerator-Based Systems,” IEEE/ACS AICCSA 2011, Sharm-El-Sheikh, Egypt, December 2011.“
Looking Back at Dense Linear Algebra Software,” Perspectives on Parallel and Distributed Processing: Looking Back and What's Ahead (to appear), 00 2012.“
Looking Back at Dense Linear Algebra Software,” Journal of Parallel and Distributed Computing, vol. 74, issue 7, pp. 2548–2560, July 2014. DOI: 10.1016/j.jpdc.2013.10.005“
A Look Back on 30 Years of the Gordon Bell Prize,” International Journal of High Performance Computing and Networking, vol. 31, issue 6, pp. 469–484, 2017.“
Logistical Quality of Service in NetSolve,” Computer Communications, vol. 22, no. 11, pp. 1034-1044, January 1999.“
Logistical Networking: Sharing More Than the Wires,” In Active Middleware Services, Ed. Salim Hariri, Craig A. Lee, Cauligi S. Raghavendra (2000), Kluwer Academic, Norwell, MA, January 2000.“
Logistical Computing and Internetworking: Middleware for the Use of Storage in Communication,” submitted to SC2001, Denver, Colorado, November 2001.“
Locality and Topology aware Intra-node Communication Among Multicore CPUs,” Proceedings of the 17th EuroMPI conference, Stuttgart, Germany, LNCS, September 2010.“
Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging,” Future Generation Computer Systems, vol. 91, pp. 450-464, February 2019. DOI: 10.1016/j.future.2018.09.041“
LINPACK on Future Manycore and GPu Based Systems,” PARA 2010, Reykjavik, Iceland, June 2010.“
The LINPACK Benchmark: Past, Present, and Future,” Concurrency: Practice and Experience, vol. 15, pp. 803-820, 00 2008.“
Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators,” Euro-Par 2019: Parallel Processing. Euro-Par 2019. Lecture Notes in Computer Science, vol. 11725, Cham, Springer, pp. 495 - 506, 2019. DOI: 10.1007/978-3-030-29400-7_35“
Linear Systems Performance Report,” SLATE Working Notes, no. 8, ICL-UT-18-08: Innovative Computing Laboratory, University of Tennessee, September 2018.“
Linear Algebra Software for High-Performance Computing (Part 2: Software for Hardware Accelerators and Coprocessors) , Frankfurt, Germany, ISC High Performance (ISC18), Tutorial Presentation, June 2015.
Limitations of the Playstation 3 for High Performance Cluster Computing,” University of Tennessee Computer Science Technical Report, UT-CS-07-597 (Also LAPACK Working Note 185), 00 2007.“
Level-3 Cholesky Kernel Subroutine of a Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm,” ACM TOMS (submitted), also LAPACK Working Note (LAWN) 211, 00 2010.“
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms,” ACM Transactions on Mathematical Software (TOMS), vol. 39, issue 2, February 2013. DOI: 10.1145/2427023.2427026“
Least squares solvers for distributed-memory machines with GPU accelerators,” Proceedings of the ACM International Conference on Supercomputing (ICS '19), Phoenix, Arizona, ACM, 2019. DOI: 10.1145/3330345.3330356“
Least Squares Performance Report,” SLATE Working Notes, no. 9, ICL-UT-18-10: Innovative Computing Laboratory, University of Tennessee, December 2018.“
Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations,” International Supercomputing Conference (ISC), Lecture Notes in Computer Science, vol. 7905, Leipzig, Germany, Springer Berlin Heidelberg, pp. 67-80, June 2013. DOI: 10.1007/978-3-642-38750-0_6“
LAWN 294: Aasen's Symmetric Indenite Linear Solvers in LAPACK,” LAPACK Working Note, no. LAWN 294, ICL-UT-17-13: University of Tennessee, December 2017.“
Large Event Traces in Parallel Performance Analysis,” 8th Workshop 'Parallel Systems and Algorithms' (PASA), Lecture Notes in Informatics, no. ICL-UT-06-08, Frankfurt/Main, Germany, Gesellschaft für Informatik, March 2006.“
LAPACK Users' Guide, 3rd ed.,” Philadelphia: Society for Industrial and Applied Mathematics, January 1999.“
LAPACK for Clusters Project: An Example of Self Adapting Numerical Software,” Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS 04'), vol. 9, Big Island, Hawaii, pp. 90282, January 2004.“
LAPACK 2005 Prospectus: Reliable and Scalable Software for Linear Algebra Computations on High End Computers : LAPACK Working Note 164, January 2005.
LAPACK,” Handbook of Linear Algebra, Second, Boca Raton, FL, CRC Press, 2013.“
L2 Cache Modeling for Scientific Applications on Chip Multi-Processors,” Proceedings of the 2007 International Conference on Parallel Processing, Xi'an, China, IEEE Computer Society, January 2007.“
KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Applications,” Proc. of the European Conference on Parallel Computing (EuroPar), vol. 2790, Klagenfurt, Austria, Springer-Verlag, pp. 1301-1304, August 2003.“
Kernel-assisted and topology-aware MPI collective communications on multi-core/many-core platforms,” Journal of Parallel and Distributed Computing, vol. 73, issue 7, pp. 1000-1010, July 2013. DOI: 10.1016/j.jpdc.2013.01.015“
Kernel Assisted Collective Intra-node MPI Communication Among Multi-core and Many-core CPUs,” Int'l Conference on Parallel Processing (ICPP '11), Taipei, Taiwan, September 2011.“
Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs,” University of Tennessee Computer Science Technical Report, UT-CS-10-663, November 2010.“
Keeneland: Computational Science Using Heterogeneous GPU Computing,” Contemporary High Performance Computing: From Petascale Toward Exascale, Boca Raton, FL, Taylor and Francis, 2013.“
JLAPACK - Compiling LAPACK Fortran to Java,” Scientific Programming, vol. 7, no. 2, pp. 111-138, October 2002.“
Iterative Sparse Triangular Solves for Preconditioning,” EuroPar 2015, Vienna, Austria, Springer Berlin, August 2015. DOI: 10.1007/978-3-662-48096-0_50“
Iterative Solver Benchmark (LAPACK Working Note 152),” Scientific Programming, vol. 9, no. 4, pp. 223-231, 00 2001.“
An Iterative Solver Benchmark,” Scientific Programming (to appear), 00 2002.“
I/O Performance Analysis for the Petascale Simulation Code FLASH,” ISC'09, Hamburg, Germany, June 2009.“
Investigating Power Capping toward Energy-Efficient Scientific Applications,” Concurrency Computation: Practice and Experience, vol. 2018, issue e4485, pp. 1-14, April 2018. DOI: 10.1002/cpe.4485“
Investigating Half Precision Arithmetic to Accelerate Dense Linear System Solvers,” ScalA17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, ACM.“
An Introduction to the MAGMA project - Acceleration of Dense Linear Algebra : NVIDIA Webinar, June 2010.
Introduction to the HPCChallenge Benchmark Suite,” ICL Technical Report, no. ICL-UT-05-01, (Also appears as CS Dept. Tech Report UT-CS-05-544), January 2005.“
Introduction to the HPC Challenge Benchmark Suite , March 2005.
Interoperable Convergence of Storage, Networking, and Computation,” FICC 2019, San Francisco, CA, Springer, March 14 15, 2019.“
Interoperable Convergence of Storage, Networking, and Computation,” Future of Information and Communication Conference (FICC), San Francisco, Science and Information (SAI), March 2019.“