Autotuning Techniques for Performance-Portable Point Set Registration in 3D,” Supercomputing Frontiers and Innovations, vol. 5, no. 4, December 2018. DOI: 10.14529/jsfi180404“
Least Squares Performance Report,” SLATE Working Notes, no. 9, ICL-UT-18-10: Innovative Computing Laboratory, University of Tennessee, December 2018.“
Linear Systems Performance Report,” SLATE Working Notes, no. 8, ICL-UT-18-08: Innovative Computing Laboratory, University of Tennessee, September 2018.“
MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Research Poster, November 2018.
MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) , Washington, DC, NSF PI Meeting, Poster, April 2018. DOI: 10.6084/m9.figshare.6174143.v3
Parallel BLAS Performance Report,” SLATE Working Notes, no. 5, ICL-UT-18-01: University of Tennessee, April 2018.“
Parallel Norms Performance Report,” SLATE Working Notes, no. 6, ICL-UT-18-06: Innovative Computing Laboratory, University of Tennessee, June 2018.“
Production Implementations of Pipelined & Communication-Avoiding Iterative Linear Solvers , Tokyo, Japan, SIAM Conference on Parallel Processing for Scientific Computing, March 2018.
The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale,” SIAM Review, vol. 60, issue 4, pp. 808–865, November 2018. DOI: 10.1137/17M1117732“
Software-Defined Events (SDEs) in MAGMA-Sparse,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-12: University of Tennessee, December 2018.“
Solver Interface & Performance on Cori,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-05: University of Tennessee, June 2018.“
Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 8, pp. 1879–1892, August 2018. DOI: 10.1109/TPDS.2018.2808964“
Distributed-Memory Lattice H-Matrix Factorization,” The International Journal of High Performance Computing Applications, vol. 33, issue 5, pp. 1046–1063, August 2019. DOI: 10.1177/1094342019861139“
Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization,” PAW-ATM Workshop at SC19, Denver, CO, ACM, November 2019.“
Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators,” IEEE High Performance Extreme Computing Conference (HPEC 2019), Best Paper Finalist, Waltham, MA, IEEE, September 2019.“
Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation,” International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.“
Performance of Asynchronous Optimized Schwarz with One-sided Communication,” Parallel Computing, vol. 86, pp. 66-81, August 2019. DOI: 10.1016/j.parco.2019.05.004“
PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP,” ACM Transactions on Mathematical Software, vol. 45, issue 2, June 2019. DOI: 10.1145/3264491“
Reducing the Amount of out-of-core Data Access for GPU-Accelerated Randomized SVD,” Concurrency and Computation: Practice and Experience, April 2020. DOI: 10.1002/cpe.5754“
SLATE Tutorial , Houston, TX, 2020 ECP Annual Meeting, February 2020.