Publications
LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU,”
16th IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.
(684.73 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
LU Factorization with Partial Pivoting for a Multicore System with Accelerators,”
IEEE Transactions on Parallel and Distributed Computing, vol. 24, issue 8, pp. 1613-1621, August 2013.
(1.08 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi,”
IEEE High Performance Extreme Computing Conference (HPEC'16), Waltham, MA, IEEE, September 2016.
(943.23 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA: A Breakthrough in Solvers for Eigenvalue Problems
, San Jose, CA, GPU Technology Conference (GTC12), Presentation, May 2012.
(9.23 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures
, Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.
(4.69 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-16-02: University of Tennessee, August 2016.
(929.79 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing,”
2015 IEEE High Performance Extreme Computing Conference (HPEC ’15), (Best Paper Award), Waltham, MA, IEEE, September 2015.
(678.86 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA: Evolution and Revolution
, Knoxville, TN, ICL Lunch Talk Seminar, July 2021.
(8.88 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA - LAPACK for GPUs
, Atlanta, GA, Keeneland GPU Tutorial, April 2011.
(742.14 KB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA - LAPACK for HPC on Heterogeneous Architectures
, Oak Ridge, TN, Titan Summit at Oak Ridge National Laboratory, Presentation, August 2011.
(20.43 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors
, Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.
(6.4 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi
, Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
(2.03 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Templates for Scalable Linear Algebra on Emerging Architectures,”
The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020.
“MAGMA Tensors and Batched Computing for Accelerating Applications on GPUs
, San Jose, CA, GPU Technology Conference (GTC17), Presentation in Session S7728, May 2017.
(11.12 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA Tutorial
, Atlanta, GA, Keeneland Workshop, February 2012.
(2.47 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs
: University of Tennessee, January 2019.
(7.84 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MagmaDNN: Accelerated Deep Learning Using MAGMA,”
Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.
(1.09 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs
, Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.
(5.06 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,”
ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.
(1.37 MB)
(8.72 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
![application/pdf](/modules/file/icons/application-pdf.png)
MAGMA-sparse Interface Design Whitepaper,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.
(1.28 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Making Performance Analysis and Tuning Part of the Software Development Cycle,”
Proceedings of DoD HPCMP UGC 2009, San Diego, CA, IEEE, June 2009.
“MaPHyS or the Development of a Parallel Algebraic Domain Decomposition Solver in the Course of the Solstice Project,”
Sparse Days 2010 Meeting at CERFACS, Toulouse, France, June 2010.
“The Marketplace for High-Performance Computers,”
Parallel Computing, vol. 25, no. 13-14, pp. 1517-1545, October 2002.
(285.78 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Massively Parallel Automated Software Tuning,”
48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan, ACM Press, August 2019.
(911.88 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines
, Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Research Poster, November 2018.
(2.55 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines
, Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020.
(2.28 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Materials fingerprinting classification,”
Computer Physics Communications, pp. 108019, May Jan.
(3.8 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Matrices Over Runtime Systems at Exascale,”
Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.
“Matrix Algebra on GPU and Multicore Architectures
, Basel, Switzerland, Workshop on GPU-enabled Numerical Libraries, Presentation, May 2011.
(49.27 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions,”
Journal of Parallel and Distributed Computing, vol. 145, pp. 188-201, November 2020.
(1.3 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation,”
International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
(480.73 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Matrix Product on Heterogeneous Master Worker Platforms,”
2008 PPoPP Conference, Salt Lake City, Utah, January 2008.
“MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR)
, Washington, DC, NSF PI Meeting, Poster, April 2018.
(2.4 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Max-Stretch Minimization on an Edge-Cloud Platform,”
IPDPS'2021, the 34th IEEE International Parallel and Distributed Processing Symposium: IEEE Computer Society Press, 2021.
(4.94 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Measuring Computer Performance: A Practioner's Guide,”
SIAM Review (book review), vol. 43, no. 2, pp. 383-384, 00 2001.
(558.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Measuring Energy and Power with PAPI,”
International Workshop on Power-Aware Systems and Architectures, Pittsburgh, PA, September 2012.
(146.79 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Memory Bandwidth and the Performance of Scientific Applications: A Study of the AMD Opteron Processor,”
2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (submitted), January 2004.
(210.29 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Memory Leak Detection in Fortran Applications using TAU,”
Proc. DoD HPCMP Users Group Conference (HPCMP-UGC'07), Pittsburgh, PA, IEEE Computer Society, January 2007.
“Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements
, St. Petersburg, FL, 28th HIPS Workshop, May 2023.
(3.99 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements,”
2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, Florida, IEEE, August 2023.
(1.81 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Message Passing Software Systems,”
Encyclopedia of Electrical and Engineering, Supplement 1: John Wiley & Sons, Inc., 00 2000.
(289.38 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Metacomputing: An Evaluation of Emerging Systems,”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-00-445, July 2000.
(280.21 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Metacomputing Support for the SARA3D Structural Acoustics Application,”
Department of Defense Users' Group Conference (to appear), Biloxi, Mississippi, June 2001.
(64.58 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Metascheduler For The Grid,”
Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC 2002), Edinburgh, Scotland, IEEE Computer Society, pp. 343-351, July 2002.
(99.53 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
MIAMI: A Framework for Application Performance Diagnosis ,”
IPASS-2014, Monterey, CA, IEEE, March 2014.
(1010.75 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Middleware for the Use of Storage in Communication,”
Parallel Computing, vol. 28, no. 12, pp. 1773-1788, August 2002.
(87.97 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed Precision Algebraic Multigrid on GPUs,”
Parallel Processing and Applied Mathematics (PPAM 2022), vol. 13826, Cham, Springer International Publishing, April 2023.
“Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,”
ICL Technical Report, no. ICL-UT-22-04, May 2022.
(706.14 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems,”
International Journal of High Performance Computer Applications (to appear), August 2007.
(157.4 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Mixed Precision LU Factorization on GPU Tensor Cores: Reducing Data Movement and Memory Footprint,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-13: University of Tennessee, September 2020.
(409 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)