Implementation of the C++ API for Batch BLAS,” SLATE Working Notes, no. 07, ICL-UT-18-04: Innovative Computing Laboratory, University of Tennessee, June 2018.“
Linear Systems Performance Report,” SLATE Working Notes, no. 08, ICL-UT-18-08: Innovative Computing Laboratory, University of Tennessee, September 2018.“
Parallel BLAS Performance Report,” SLATE Working Notes, no. 05, ICL-UT-18-01: University of Tennessee, April 2018.“
Parallel Norms Performance Report,” SLATE Working Notes, no. 06, ICL-UT-18-06: Innovative Computing Laboratory, University of Tennessee, June 2018.“
The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale,” SIAM Review, vol. 60, issue 4, pp. 808–865, November 2018. DOI: 10.1137/17M1117732“
Task Based Cholesky Decomposition on Xeon Phi Architectures using OpenMP,” International Journal of Computational Science and Engineering (IJCSE), vol. 17, no. 3, October 2018. DOI: http://dx.doi.org/10.1504/IJCSE.2018.095851“
A Collection of White Papers from the BDEC2 Workshop in Poznan, Poland,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-10: University of Tennessee, Knoxville, May 2019.“
Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators,” IEEE High Performance Extreme Computing Conference (HPEC 2019), Best Paper Finalist, Waltham, MA, IEEE, September 2019.“
PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP,” ACM Transactions on Mathematical Software, vol. 45, issue 2, June 2019. DOI: 10.1145/3264491“
Clover: Computational Libraries Optimized via Exascale Research , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime,” 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), New Orleans, LA, IEEE, May 2020. DOI: 10.1109/IPDPSW50202.2020.00127“
Docker Container based PaaS Cloud Computing Comprehensive Benchmarks using LAPACK,” Computer Modeling and Intelligent Systems CMIS-2020, Zaporizhzhoa, March 2020.“
Interoperable Convergence of Storage, Networking, and Computation,” Advances in Information and Communication: Proceedings of the 2019 Future of Information and Communication Conference (FICC), no. 2: Springer International Publishing, pp. 667-690, 2020.“
The PLASMA Library on CORAL Systems and Beyond (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
Prospectus for the Next LAPACK and ScaLAPACK Libraries: Basic ALgebra LIbraries for Sustainable Technology with Interdisciplinary Collaboration (BALLISTIC),” LAPACK Working Notes, no. 297, ICL-UT-20-07: University of Tennessee.“
Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques,” 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), Atlanta, GA, IEEE, November 2020.“
Scalable Data Generation for Evaluating Mixed-Precision Solvers,” 2020 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, IEEE, September 2020. DOI: 10.1109/HPEC43674.2020.9286145“
A Set of Batched Basic Linear Algebra Subprograms,” ACM Transactions on Mathematical Software, October 2020.“
A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic,” SLATE Working Notes, no. 15, ICL-UT-20-08: University of Tennessee, July 2020.“
Translational Process: Mathematical Software Perspective,” Journal of Computational Science, September 2020. DOI: 10.1016/j.jocs.2020.101216“
Translational Process: Mathematical Software Perspective,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-11, August 2020.“
Using Quantized Integer in LU Factorization with Partial Pivoting (Poster) , Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020.
P1673R3: A Free Function Linear algebra Interface Based on the BLAS,” ISO JTC1 SC22 WG22, no. P1673R3: ISO, April 2021.“