Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, November 2014.“
Improving the performance of CA-GMRES on multicores with multiple GPUs,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.“
Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives,” Proceedings of The 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017), Best Paper Award, Orlando, FL, June 2017. DOI: 10.1109/IPDPSW.2017.65“
Comparing performance of s-step and pipelined GMRES on distributed-memory multicore CPUs , Pittsburgh, Pennsylvania, SIAM Annual Meeting, July 2017.
Production Implementations of Pipelined & Communication-Avoiding Iterative Linear Solvers , Tokyo, Japan, SIAM Conference on Parallel Processing for Scientific Computing, March 2018.
MAGMA-sparse Interface Design Whitepaper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.“
P1673R3: A Free Function Linear algebra Interface Based on the BLAS,” ISO JTC1 SC22 WG22, no. P1673R3: ISO, April 2021.“
Software-Defined Events (SDEs) in MAGMA-Sparse,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-12: University of Tennessee, December 2018.“
Solver Interface & Performance on Cori,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-05: University of Tennessee, June 2018.“