High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs,” 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA): IEEE, November 2020.“
How to Build Your Own Deep Neural Network : PEARC20, July 2020.
Integrating Deep Learning in Domain Science at Exascale (MagmaDNN) , virtual, DOD HPCMP seminar, December 2020.
Integrating Deep Learning in Domain Sciences at Exascale,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-10: University of Tennessee, August 2020.“
Integrating Deep Learning in Domain Sciences at Exascale,” 2020 Smoky Mountains Computational Sciences and Engineering Conference (SMC 2020), August 2020.“
Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices using GPUs,” International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, Springer, Cham, June 2020. DOI: 10.1007/978-3-030-50417-5_18“
Load-Balancing Sparse Matrix Vector Product Kernels on GPUs,” ACM Transactions on Parallel Computing, vol. 7, issue 1, March 2020. DOI: 10.1145/3380930“
MAGMA Templates for Scalable Linear Algebra on Emerging Architectures,” The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020. DOI: 10.1177/1094342020938421“
MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020.
Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions,” Journal of Parallel and Distributed Computing, vol. 145, pp. 188-201, November 2020. DOI: 10.1016/j.jpdc.2020.07.001“
Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems,” Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020. DOI: 10.1098/rspa.2020.0110“
Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-05: University of Tennessee, May 2020.“
Project-Based Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning,” The Journal of Computational Science Education, vol. 11, issue 1, pp. 36-44, January 2020. DOI: 10.22369/issn.2153-4136/11/1/7“
Reducing the Amount of out-of-core Data Access for GPU-Accelerated Randomized SVD,” Concurrency and Computation: Practice and Experience, April 2020. DOI: 10.1002/cpe.5754“
A Set of Batched Basic Linear Algebra Subprograms,” ACM Transactions on Mathematical Software, October 2020.“
A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic,” SLATE Working Notes, no. 15, ICL-UT-20-08: University of Tennessee, July 2020.“
Translational Process: Mathematical Software Perspective,” Journal of Computational Science, September 2020. DOI: 10.1016/j.jocs.2020.101216“
Translational Process: Mathematical Software Perspective,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-11, August 2020.“
Accelerating FFT towards Exascale Computing : NVIDIA GPU Technology Conference (GTC2021), 2021.
Efficient exascale discretizations: High-order finite element methods,” The International Journal of High Performance Computing Applications, pp. 10943420211020803, 2021. DOI: 10.1177/10943420211020803“
Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems,” IEEE Access, 2021. DOI: 10.1109/ACCESS.2021.3106054“
Interim Report on Benchmarking FFT Libraries on High Performance Systems,” Innovative Computing Laboratory Technical Report, no. ICL-UT-21-03: University of Tennessee, July 2021.“
libCEED: Fast algebra for high-order element-based discretizations,” Journal of Open Source Software, vol. 6, no. 63, pp. 2945, 2021. DOI: 10.21105/joss.02945“
A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms,” ICL Technical Report, no. ICL-UT-21-04: University of Tennessee, August 2021.“
Scalability Issues in FFT Computation,” International Conference on Parallel Computing Technologies: Springer, pp. 279–287, 2021. DOI: 10.1007/978-3-030-86359-3_21“
A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines,” ACM Transactions on Mathematical Software (TOMS), vol. 47, no. 3, pp. 1–23, 2021. DOI: 10.1145/3431921“
Translational process: Mathematical software perspective,” Journal of Computational Science, vol. 52, pp. 101216, 2021. DOI: 10.1016/j.jocs.2020.101216“
Analysis of the Communication and Computation Cost of FFT Libraries towards Exascale,” ICL Technical Report, no. ICL-UT-22-07: Innovative Computing Laboratory, July 2022.“
FFT Benchmark Performance Experiments on Systems Targeting Exascale,” ICL Technical Report, no. ICL-UT-22-02, March 2022.“
Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,” ICL Technical Report, no. ICL-UT-22-04, May 2022.“
PAQR: Pivoting Avoiding QR factorization,” ICL Technical Report, no. ICL-UT-22-06, June 2022.“