Heterogeneous Streaming,” The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.“
Linear Algebra Software for Large-Scale Accelerated Multicore Computing,” Acta Numerica, vol. 25, pp. 1-160, May 2016. DOI: 10.1017/S0962492916000015“
Search Space Generation and Pruning System for Autotuners,” 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.“
Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs,” 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015.“
Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.“
HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” Scientific Programming, vol. 23, issue 1, January 2015. DOI: 10.3233/SPR-140404“
Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs,” IEEE Transactions on Parallel and Distributed Systems, no. 1045-9219, November 2015.“
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi , Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems,” Supercomputing Frontiers and Innovations, vol. 2, no. 4, October 2015. DOI: 10.14529/jsfi1504“
A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination,” Concurrency and Computation: Practice and Experience, vol. 27, issue 5, pp. 1292-1309, April 2015. DOI: 10.1002/cpe.3306“
Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem,” VECPAR 2014, Eugene, OR, June 2014.“
Accelerating Numerical Dense Linear Algebra Calculations with GPUs,” Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014. DOI: 10.1007/978-3-319-06548-9_1“
clMAGMA: High Performance Dense Linear Algebra with OpenCL ,” International Workshop on OpenCL, Bristol University, England, May 2014.“
A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,” International Journal of High Performance Computing Applications, vol. 28, issue 2, pp. 196-209, May 2014. DOI: 10.1177/1094342013502097“
Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014. DOI: 10.1109/ScalA.2014.8“
clMAGMA: High Performance Dense Linear Algebra with OpenCL,” University of Tennessee Technical Report (Lawn 275), no. UT-CS-13-706: University of Tennessee, March 2013.“
Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” PPAM 2013, Warsaw, Poland, September 2013.“
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013. DOI: 10.1145/2464996.2465438“
On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,” University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.“
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,” ICCS 2012, Omaha, NE, June 2012.“
MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.
MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.
MAGMA Tutorial , Atlanta, GA, Keeneland Workshop, February 2012.
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems , no. UT-CS-11-689, December 2011.