Unveiling the Performance-energy Trade-off in Iterative Linear System Solvers for Multithreaded Processors,” Concurrency and Computation: Practice and Experience, vol. 27, issue 4, pp. 885-904, September 2014.“
Updating Incomplete Factorization Preconditioners for Model Order Reduction,” Numerical Algorithms, vol. 73, issue 3, no. 3, pp. 611–630, February 2016.“
Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning,” Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, November 2018.“
Variable-Size Batched Gauss–Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,” Parallel Computing, January 2018.“
Weighted Block-Asynchronous Relaxation for GPU-Accelerated Systems,” SIAM Journal on Computing (submitted), March 2012.“
Flexible Batched Sparse Matrix Vector Product on GPUs , Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi , Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.“
On block-asynchronous execution on GPUs,” LAPACK Working Note, no. 291, November 2016.“
A Block-Asynchronous Relaxation Method for Graphics Processing Units,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-687 / LAWN 258, November 2011.“
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,” University of Tennessee Computer Science Technical Report UT-CS-11-690 (also Lawn 260), December 2011.“
Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-σ formats on NVIDIA GPUs,” University of Tennessee Computer Science Technical Report, no. UT-EECS-14-727: University of Tennessee, April 2014.“
MAGMA-sparse Interface Design Whitepaper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.“
Roadmap for the Development of a Linear Algebra Library for Exascale Computing: SLATE: Software for Linear Algebra Targeting Exascale,” SLATE Working Notes, no. 1, ICL-UT-17-02: Innovative Computing Laboratory, University of Tennessee, June 2017.“
Software-Defined Events (SDEs) in MAGMA-Sparse,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-12: University of Tennessee, December 2018.“
Solver Interface & Performance on Cori,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-05: University of Tennessee, June 2018.“