Cray X1 Evaluation Status Report,” Oak Ridge National Laboratory Report, vol. /-2004/13, January 2004.“
The Design and Implementation of the Parallel Out of Core ScaLAPACK LU, QR, and Cholesky Factorization Routines,” Concurrency: Practice and Experience, vol. 12, no. 15, pp. 1481-1493, January 2000.“
Accelerating 2D FFT: Exploit GPU Tensor Cores through Mixed-Precision , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), ACM Student Research Poster, November 2018.