The PLASMA Library on CORAL Systems and Beyond (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
Project-Based Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning,” The Journal of Computational Science Education, vol. 11, issue 1, pp. 36-44, January 2020. DOI: 10.22369/issn.2153-4136/11/1/7“
Prospectus for the Next LAPACK and ScaLAPACK Libraries: Basic ALgebra LIbraries for Sustainable Technology with Interdisciplinary Collaboration (BALLISTIC),” LAPACK Working Notes, no. 297, ICL-UT-20-07: University of Tennessee.“
Reducing the Amount of out-of-core Data Access for GPU-Accelerated Randomized SVD,” Concurrency and Computation: Practice and Experience, April 2020. DOI: 10.1002/cpe.5754“
Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques,” 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), Atlanta, GA, IEEE, November 2020.“
Report on the Fujitsu Fugaku System,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-06: University of Tennessee, June 2020.“
Scalable Data Generation for Evaluating Mixed-Precision Solvers,” 2020 IEEE High Performance Extreme Computing Conference (HPEC): IEEE, September 2020.“
A Set of Batched Basic Linear Algebra Subprograms,” ACM Transactions on Mathematical Software, October 2020.“
SLATE Performance Report: Updates to Cholesky and LU Factorizations,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-14: University of Tennessee, October 2020.“
SLATE: Software for Linear Algebra Targeting Exascale (POSTER) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
SLATE Tutorial , Houston, TX, 2020 ECP Annual Meeting, February 2020.
SLATE Users' Guide,” SLATE Working Notes, no. 10, ICL-UT-19-01: Innovative Computing Laboratory, University of Tennessee, July 2020.“
A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic,” SLATE Working Notes, no. 15, ICL-UT-20-08: University of Tennessee, July 2020.“
Translational Process: Mathematical Software Perspective,” Journal of Computational Science, September 2020. DOI: 10.1016/j.jocs.2020.101216“
Translational Process: Mathematical Software Perspective,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-11, August 2020.“
Twenty Years of Computational Science,” International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020.“
Using Advanced Vector Extensions AVX-512 for MPI Reduction,” EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Austin, TX, September 2020. DOI: 10.1145/3416315.3416316“
Using Advanced Vector Extensions AVX-512 for MPI Reduction (Poster) , Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
Using Arm Scalable Vector Extension to Optimize Open MPI,” 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), Melbourne, Australia, IEEE/ACM, May 2020. DOI: 10.1109/CCGrid49817.2020.00-71“
Using Quantized Integer in LU Factorization with Partial Pivoting (Poster) , Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020.
Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers,” Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, March 2019. DOI: 10.1002/cpe.4460“
Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Parallel Computing, vol. 81, pp. 1–21, January 2019. DOI: 10.1016/j.parco.2018.10.003“
CEED ECP Milestone Report: Performance Tuning of CEED Software and 1st and 2nd Wave Apps : Zenodo, October 2019. DOI: 10.5281/zenodo.3477618
Characterization of Power Usage and Performance in Data-Intensive Applications using MapReduce over MPI,” 2019 International Conference on Parallel Computing (ParCo2019), Prague, Czech Republic, September 2019.“
Checkpointing Strategies for Shared High-Performance Computing Platforms,” International Journal of Networking and Computing, vol. 9, no. 1, pp. 28–52, 2019.“
Comparing the Performance of Rigid, Moldable, and Grid-Shaped Applications on Failure-Prone HPC Platforms,” Parallel Computing, vol. 85, pp. 1–12, July 2019. DOI: 10.1016/j.parco.2019.02.002“
Counter Inspection Toolkit: Making Sense out of Hardware Performance Events,” 11th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Cham, Switzerland: Springer, February 2019. DOI: 10.1007/978-3-030-11987-4_2“
Design and Implementation for FFT-ECP on Distributed Accelerated Systems,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-05: University of Tennessee, April 2019.“
Distributed-Memory Lattice H-Matrix Factorization,” The International Journal of High Performance Computing Applications, vol. 33, issue 5, pp. 1046–1063, August 2019. DOI: 10.1177/1094342019861139“
Does your tool support PAPI SDEs yet? , Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.
An Empirical View of SLATE Algorithms on Scalable Hybrid System,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-08: University of Tennessee, Knoxville, September 2019.“
Evaluation of Directive-Based Performance Portable Programming Models,” International Journal of High Performance Computing and Networking, vol. 14, issue 2, pp. 165-182. DOI: http://dx.doi.org/10.1504/IJHPCN.2017.10009064“
Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization,” PAW-ATM Workshop at SC19, Denver, CO, ACM, November 2019.“
Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs,” 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.“
FFT-ECP Fast Fourier Transform , Houston, TX, 2019 ECP Annual Meeting (Research Poster), January 2019.
FFT-ECP Implementation Optimizations and Features Phase,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-12: University of Tennessee, October 2019.“
Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC,” ScalA'19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019.“
GPUDirect MPI Communications and Optimizations to Accelerate FFTs on Exascale Systems,” EuroMPI'19 Posters, Zurich, Switzerland, no. icl-ut-19-06: ICL, September 2019.“
Hands-on Research and Training in High-Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.“
Impacts of Multi-GPU MPI Collective Communications on Large FFT Computation,” Workshop on Exascale MPI (ExaMPI) at SC19, Denver, CO, November 2019.“
Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators,” IEEE High Performance Extreme Computing Conference (HPEC 2019), Best Paper Finalist, Waltham, MA, IEEE, September 2019.“
Least Squares Solvers for Distributed-Memory Machines with GPU Accelerators,” ACM International Conference on Supercomputing (ICS '19), Phoenix, Arizona, ACM, pp. 117–126, June 2019. DOI: https://dl.acm.org/doi/abs/10.1145/3330345.3330356“
Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators,” Euro-Par 2019: Parallel Processing, vol. 11725: Springer, pp. 495–506, August 2019. DOI: 10.1007/978-3-030-29400-7_35“
MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs : University of Tennessee, January 2019. DOI: 10.13140/RG.2.2.14906.64961
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019. DOI: 10.1007/978-3-030-34356-9_37“
Massively Parallel Automated Software Tuning,” 48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan, ACM Press, August 2019. DOI: 10.1145/3337821.3337908“
Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation,” International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.“
Optimizing Batch HGEMM on Small Sizes Using Tensor Cores , San Jose, CA, GPU Technology Conference (GTC), March 2019.
PAPI Software-Defined Events for in-Depth Performance Analysis,” The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1113-1127, November 2019.“
PAPI's new Software-Defined Events for in-depth Performance Analysis , Dresden, Germany, 13th Parallel Tools Workshop, September 2019.