PAPI 5: Measuring Power, Energy, and the Cloud , Austin, TX, 2013 IEEE International Symposium on Performance Analysis of Systems and Software, April 2013.
Power-aware Computing on GPGPUs , Gatlinburg, TN, Fall Creek Falls Conference, Poster, September 2011.
Scheduling Cholesky Factorization on Multicore Architectures with GPU Accelerators , Knoxville, TN, 2010 Symposium on Application Accelerators in High-Performance Computing (SAAHPC'10), Poster, July 2010.
A Standard for Batched BLAS Routines , Paris, France, 17th SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP16), April 2016.
Tensor Contractions using Optimized Batch GEMM Routines , San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.
Towards a High-Performance Tensor Algebra Package for Accelerators , Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC15), September 2015.
Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption , Frankfurt, Germany, ISC High Performance (ISC18), Best Poster Award, June 2018.
Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers : 2010 Symposium on Application Accelerators in. High-Performance Computing (SAAHPC'10), Tutorial, July 2010.
Accelerating Linear Algebra with MAGMA , Knoxville, TN, ECP Annual Meeting 2018, Tutorial, February 2018.
Accelerating Tensor Contractions in High-Order FEM with MAGMA Batched , Atlanta, GA, SIAM Conference on Computer Science and Engineering (SIAM CSE17), Presentation, March 2017.
Autotuning Dense Linear Algebra Libraries on GPUs , Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.
Comparing performance of s-step and pipelined GMRES on distributed-memory multicore CPUs , Pittsburgh, Pennsylvania, SIAM Annual Meeting, July 2017.
Dense Linear Algebra Solvers for Multicore with GPU Accelerators , Atlanta, GA, International Parallel and Distributed Processing Symposium (IPDPS 2010), April 2010.
On the Design, Autotuning, and Optimization of GPU Kernels for Kinetic Network Simulations Using Fast Explicit Integration and GPU Batched Computation , Oak Ridge, TN, Joint Institute for Computational Sciences Seminar Series, Presentation, September 2015.
Does your tool support PAPI SDEs yet? , Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.
Flexible Batched Sparse Matrix Vector Product on GPUs , Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.
The Future of Computing: Software Libraries , Savannah, GA, DOD CREATE Developers' Review, Keynote Presentation, February 2012.
An Introduction to the MAGMA project - Acceleration of Dense Linear Algebra : NVIDIA Webinar, June 2010.
Linear Algebra Software for High-Performance Computing (Part 2: Software for Hardware Accelerators and Coprocessors) , Frankfurt, Germany, ISC High Performance (ISC18), Tutorial Presentation, June 2015.
MAGMA: A Breakthrough in Solvers for Eigenvalue Problems , San Jose, CA, GPU Technology Conference (GTC12), Presentation, May 2012.
MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.
MAGMA - LAPACK for GPUs , Atlanta, GA, Keeneland GPU Tutorial, April 2011.
MAGMA - LAPACK for HPC on Heterogeneous Architectures , Oak Ridge, TN, Titan Summit at Oak Ridge National Laboratory, Presentation, August 2011.
MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi , Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
MAGMA Tensors and Batched Computing for Accelerating Applications on GPUs , San Jose, CA, GPU Technology Conference (GTC17), Presentation in Session S7728, May 2017.
MAGMA Tutorial , Atlanta, GA, Keeneland Workshop, February 2012.
MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs : University of Tennessee, January 2019. DOI: 10.13140/RG.2.2.14906.64961
MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs , Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.
Matrix Algebra on GPU and Multicore Architectures , Basel, Switzerland, Workshop on GPU-enabled Numerical Libraries, Presentation, May 2011.
PAPI: Counting outside the Box , Barcelona, Spain, 8th JLESC Meeting, April 2018.
PAPI's new Software-Defined Events for in-depth Performance Analysis , Dresden, Germany, 13th Parallel Tools Workshop, September 2019.
PAPI's New Software-Defined Events for In-Depth Performance Analysis , Lyon, France, CCDSC 2018: Workshop on Clusters, Clouds, and Data for Scientific Computing, September 2018.
Power-Aware HPC on Intel Xeon Phi KNL Processors , Frankfurt, Germany, ISC High Performance (ISC17), Intel Booth Presentation, June 2017.
Production Implementations of Pipelined & Communication-Avoiding Iterative Linear Solvers , Tokyo, Japan, SIAM Conference on Parallel Processing for Scientific Computing, March 2018.
Software-Defined Events through PAPI for In-Depth Analysis of Application Performance , Basel, Switzerland, 5th Platform for Advanced Scientific Computing Conference (PASC18), July 2018.
Understanding Native Event Semantics , Knoxville, TN, 9th JLESC Workshop, April 2019.
What it Takes to keep PAPI Instrumental for the HPC Community , Collegeville, MN, The 2019 Collegeville Workshop on Sustainable Scientific Software (CW3S19), July 2019.
Is your scheduling good? How would you know? , Bordeaux, France, 14th Scheduling for Large Scale Systems Workshop, June 2019.
2016 Dense Linear Algebra Software Packages Survey,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-744 / LAWN 290: University of Tennessee, September 2016.“
Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.“
Accelerating the Reduction to Upper Hessenberg Form through Hybrid GPU-Based Computing,” University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 2009.“
Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization,” University of Tennessee Computer Science Technical Report (also as a LAWN), no. ICL-UT-11-08, September 2011.“
Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,” University of Tennessee Computer Science Department Technical Report, vol. –05-561, November 2005.“
Algorithm-based Fault Tolerance for Dense Matrix Factorizations,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.“
Algorithmic Based Fault Tolerance Applied to High Performance Computing,” University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.“
On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,” University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.“
Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.“
Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,” University of Tennessee Computer Science Technical Report, UT-CS-11-666, (also Lawn 243), 00 2011.“