Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs,” IEEE Transactions on Parallel and Distributed Systems, no. 1045-9219, November 2015.“
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi , Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems,” Supercomputing Frontiers and Innovations, vol. 2, no. 4, October 2015. DOI: 10.14529/jsfi1504“
A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination,” Concurrency and Computation: Practice and Experience, vol. 27, issue 5, pp. 1292-1309, April 2015. DOI: 10.1002/cpe.3306“
Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem,” VECPAR 2014, Eugene, OR, June 2014.“
Accelerating Numerical Dense Linear Algebra Calculations with GPUs,” Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014. DOI: 10.1007/978-3-319-06548-9_1“
clMAGMA: High Performance Dense Linear Algebra with OpenCL ,” International Workshop on OpenCL, Bristol University, England, May 2014.“
A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,” International Journal of High Performance Computing Applications, vol. 28, issue 2, pp. 196-209, May 2014. DOI: 10.1177/1094342013502097“
Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014. DOI: 10.1109/ScalA.2014.8“
clMAGMA: High Performance Dense Linear Algebra with OpenCL,” University of Tennessee Technical Report (Lawn 275), no. UT-CS-13-706: University of Tennessee, March 2013.“
Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” PPAM 2013, Warsaw, Poland, September 2013.“
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013. DOI: 10.1145/2464996.2465438“
On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,” University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.“
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,” ICCS 2012, Omaha, NE, June 2012.“
MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.
MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.
MAGMA Tutorial , Atlanta, GA, Keeneland Workshop, February 2012.
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems , no. UT-CS-11-689, December 2011.