Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014. DOI: 10.1109/ScalA.2014.8“
clMAGMA: High Performance Dense Linear Algebra with OpenCL,” University of Tennessee Technical Report (Lawn 275), no. UT-CS-13-706: University of Tennessee, March 2013.“
Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” PPAM 2013, Warsaw, Poland, September 2013.“
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013. DOI: 10.1145/2464996.2465438“
On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,” University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.“
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,” ICCS 2012, Omaha, NE, June 2012.“
MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.
MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.
MAGMA Tutorial , Atlanta, GA, Keeneland Workshop, February 2012.
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems , no. UT-CS-11-689, December 2011.