Publications
Report on the TianHe-2A System,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-04: University of Tennessee, September 2017.
(7.15 MB)
“Numerical Libraries and Tools for Scalable Parallel Cluster Computing,”
International Journal of High Performance Applications and Supercomputing, vol. 15, no. 2, pp. 175-180, January 2001.
(37.38 KB)
“How Elegant Code Evolves With Hardware: The Case Of Gaussian Elimination,”
in Beautiful Code Leading Programmers Explain How They Think: O'Reilly Media, Inc., June 2007.
(257 KB)
“Translational Process: Mathematical Software Perspective,”
Journal of Computational Science, September 2020.
(752.59 KB)
“Twenty-Plus Years of Netlib and NA-Net,”
University of Tennessee Computer Science Department Technical Report, UT-CS-04-526, 00 2006.
(62.79 KB)
“Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, UT-CS-89-85, 00 2010.
(6.42 MB)
“The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems,”
International Conference on Computational Science (ICCS 2017), Zürich, Switzerland, Elsevier, June 2017.
(446.14 KB)
“Empirical Performance Tuning of Dense Linear Algebra Software,”
in Performance Tuning of Scientific Applications (to appear), 00 2010.
“Performance of Various Computers Using Standard Linear Equations Software,”
University of Tennessee Computer Science Technical Report, no. cs-89-85, February 2013.
(539.24 KB)
“Numerical Linear Algebra Algorithms and Software,”
Journal of Computational and Applied Mathematics, vol. 123, no. 1-2, pp. 489-514, October 1999.
(258.62 KB)
“Hierarchical QR Factorization Algorithms for Multi-Core Cluster Systems,”
IPDPS 2012, the 26th IEEE International Parallel and Distributed Processing Symposium, Shanghai, China, IEEE Computer Society Press, May 2012.
(405.71 KB)
“Performance Application Programming Interface for Extreme-Scale Environments (PAPI-EX) (Poster)
, Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, 20 2020.
(2.53 MB)
Report on the Oak Ridge National Laboratory's Frontier System,”
ICL Technical Report, no. ICL-UT-22-05, May 2022.
(16.87 MB)
“Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems,”
Supercomputing Frontiers and Innovations, vol. 1, issue 1, 2014.
(1.86 MB)
“An Iterative Solver Benchmark,”
Scientific Programming (to appear), 00 2002.
(142.67 KB)
“Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Department Technical Report, UT-CS-04-526, vol. –89-95, January 2006.
(6.42 MB)
“The HPL Benchmark: Past, Present & Future
, ISC High Performance, Frankfurt, Germany, July 2016.
(3.41 MB)
Recent Advances in Parallel Virtual Machine and Message Passing Interface,”
Lecture Notes in Computer Science, vol. 2840: Springer-Verlag, Berlin, January 2003.
“High-Performance Computing,”
The Princeton Companion to Applied Mathematics, Princeton, New Jersey, Princeton University Press, pp. 839-842, 2015.
“Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, no. CS-89-85, 00 2011.
(6.42 MB)
“Race to Exascale,”
Computing in Science and Engineering, vol. 21, issue 1, pp. 4-5, March 2019.
(106.97 KB)
“Accelerating Numerical Dense Linear Algebra Calculations with GPUs,”
Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014.
(1.06 MB)
“Self Adapting Numerical Algorithm for Next Generation Applications,”
International Journal of High Performance Computing Applications, vol. 17, no. 2, pp. 125-132, January 2003.
(479.18 KB)
“Performance and Reliability Trade-offs for the Double Checkpointing Algorithm,”
International Journal of Networking and Computing, vol. 4, no. 1, pp. 32-41.
(859.04 KB)
“Revisiting the Double Checkpointing Algorithm,”
15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium, Boston, MA, May 2013.
(591.1 KB)
“An Asynchronous Algorithm on NetSolve Global Computing System,”
Future Generation Computer Systems, vol. 22, issue 3, pp. 279-290, February 2006.
(568.92 KB)
“Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators,”
Proceedings of the IEEE, vol. 106, issue 11, pp. 2040–2055, November 2018.
(2.53 MB)
“Recursive approach in sparse matrix LU factorization,”
Proceedings of 1st SGI Users Conference, Cracow, Poland (ACC Cyfronet UMM, 2000), pp. 409-418, January 2000.
(176.14 KB)
“DARPA's HPCS Program: History, Models, Tools, Languages,”
in Advances in Computers, vol. 72: Elsevier, January 2008.
(3.61 MB)
“Trends in High Performance Computing,”
The Computer Journal, vol. 47, no. 4: The British Computer Society, pp. 399-403, 00 2004.
(455.96 KB)
“Batched BLAS (Basic Linear Algebra Subprograms) 2018 Specification
, July 2018.
(483.05 KB)
EZTrace: a generic framework for performance analysis,”
ICL Technical Report, no. ICL-UT-11-01, December 2010.
“Performance Instrumentation and Measurement for Terascale Systems,”
ICCS 2003 Terascale Workshop, Melbourne, Australia, Springer, Berlin, Heidelberg, June 2003.
(5.36 MB)
“High Performance Matrix Inversion Based on LU Factorization for Multicore Architectures,”
Proceedings of MTAGS11, Seattle, WA, November 2011.
(879.49 KB)
“Revisiting Matrix Product on Master-Worker Platforms,”
International Journal of Foundations of Computer Science (IJFCS), vol. 19, no. 6, pp. 1317-1336, December 2008.
(248.66 KB)
“Translational process: Mathematical software perspective,”
Journal of Computational Science, vol. 52, pp. 101216, 2021.
“The evolution of mathematical software,”
Communications of the ACM, vol. 65227, issue 12, pp. 66 - 72, December 2022.
“Algorithmic Based Fault Tolerance Applied to High Performance Computing,”
Journal of Parallel and Distributed Computing, vol. 69, pp. 410-416, 00 2009.
(313.55 KB)
“Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization,”
University of Tennessee Computer Science Technical Report (also as a LAWN), no. ICL-UT-11-08, September 2011.
(618.53 KB)
“Recent Trends in High Performance Computing,”
in Birth of Numerical Analysis (to appear), 00 2009.
“Biannual Top-500 Computer Lists Track Changing Environments for Scientific Computing,”
SIAM News, vol. 34, no. 9, October 2002.
(2.62 MB)
“A New Recursive Implementation of Sparse Cholesky Factorization,”
Proceedings of 16th IMACS World Congress 2000 on Scientific Computing, Applications Mathematics and Simulation, Lausanne, Switzerland, August 2000.
“Matrix Product on Heterogeneous Master Worker Platforms,”
2008 PPoPP Conference, Salt Lake City, Utah, January 2008.
“Performance of Various Computers Using Standard Linear Equations Software, (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, no. CS-89-85: University of Tennessee, June 2014.
(514.64 KB)
“Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Dept. Technical Report CS-89-85, 00 2007.
(6.42 MB)
“LINPACK on Future Manycore and GPu Based Systems,”
PARA 2010, Reykjavik, Iceland, June 2010.
“Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),”
University of Tennessee Computer Science Technical Report, no. CS-89-85, January 2001.
(6.42 MB)
“Netlib and NA-Net: Building a Scientific Computing Community,”
IEEE Annals of the History of Computing, vol. 30, no. 2, pp. 30-41, January 2008.
(352.71 KB)
“Revisiting Matrix Product on Master-Worker Platforms,”
International Journal of Foundations of Computer Science (IJFCS) (accepted), 00 2007.
(248.66 KB)
“Message Passing Software Systems,”
Encyclopedia of Electrical and Engineering, Supplement 1: John Wiley & Sons, Inc., 00 2000.
(289.38 KB)
“