Publications
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi
, Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
(2.03 MB)

Random-Order Alternating Schwarz for Sparse Triangular Solves,”
2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.
(1.53 MB)
“
Tuning Stationary Iterative Solvers for Fault Resilience,”
6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA15), Austin, TX, ACM, November 2015.
(1.28 MB)
“
Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.
(1.83 MB)
“
Hybrid Multi-Elimination ILU Preconditioners on GPUs,”
International Heterogeneity in Computing Workshop (HCW), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(1.67 MB)
“
Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-σ formats on NVIDIA GPUs,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-14-727: University of Tennessee, April 2014.
(578.11 KB)
“
Improving the Energy Efficiency of Sparse Linear System Solvers on Multicore and Manycore Systems,”
Philosophical Transactions of the Royal Society A -- Mathematical, Physical and Engineering Sciences, vol. 372, issue 2018, July 2014.
DOI: 10.1098/rsta.2013.0279
(779.57 KB)
“
Improving the performance of CA-GMRES on multicores with multiple GPUs,”
IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(333.82 KB)
“
Optimizing Krylov Subspace Solvers on Graphics Processing Units,”
Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(536.32 KB)
“
Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures,”
VECPAR 2014, Eugene, OR, June 2014.
(430.56 KB)
“
Unveiling the Performance-energy Trade-off in Iterative Linear System Solvers for Multithreaded Processors,”
Concurrency and Computation: Practice and Experience, vol. 27, issue 4, pp. 885-904, September 2014.
DOI: 10.1002/cpe.3341
(1.83 MB)
“
A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
Journal of Parallel and Distributed Computing, vol. 73, issue 12, pp. 1613–1626, December 2013.
DOI: http://dx.doi.org/10.1016/j.jpdc.2013.05.008
(1.08 MB)
“
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,”
ICCS 2012, Omaha, NE, June 2012.
(608.95 KB)
“
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,”
EuroPar 2012 (also LAWN 260), Rhodes Island, Greece, August 2012.
(662.98 KB)
“
Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems,”
Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Best Paper), Rhodes Island, Greece, August 2012.
(764.02 KB)
“
Weighted Block-Asynchronous Relaxation for GPU-Accelerated Systems,”
SIAM Journal on Computing (submitted), March 2012.
(811.01 KB)
“
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems
, no. UT-CS-11-689, December 2011.
(608.95 KB)

A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-687 / LAWN 258, November 2011.
(1.08 MB)
“
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,”
University of Tennessee Computer Science Technical Report UT-CS-11-690 (also Lawn 260), December 2011.
(662.98 KB)
“