Publications
Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,”
University of Tennessee Computer Science Department Technical Report, vol. –05-561, November 2005.
(266.54 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithm-based Fault Tolerance for Dense Matrix Factorizations,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.
(865.79 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithmic Based Fault Tolerance Applied to High Performance Computing,”
University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.
(313.55 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,”
University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.
(358.98 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.
(3.74 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-666, (also Lawn 243), March 2011.
(1.65 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analysis of the Communication and Computation Cost of FFT Libraries towards Exascale,”
ICL Technical Report, no. ICL-UT-22-07: Innovative Computing Laboratory, July 2022.
(5.91 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analysis of Various Scalar, Vector, and Parallel Implementations of RandomAccess,”
Innovative Computing Laboratory (ICL) Technical Report, no. ICL-UT-10-03, June 2010.
(226.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms,”
University of Tennessee Computer Science Technical Report, UT-CS-08-626, January 2008.
(650.75 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
ASCR@40: Four Decades of Department of Energy Leadership in Advanced Scientific Computing Research
: Advanced Scientific Computing Advisory Committee (ASCAC), US Department of Energy, August 2020.
ASCR@40: Highlights and Impacts of ASCR’s Programs
: US Department of Energy’s Office of Advanced Scientific Computing Research, June 2020.
Assessing the impact of ABFT and Checkpoint composite strategies,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.
(968.47 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An Asynchronous Algorithm on NetSolve Global Computing System,”
PRiSM - Laboratoire de recherche en informatique, Université de Versailles St-Quentin Technical Report, March 2004.
(377.33 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.
(188.51 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
ATLAS on the BlueGene/L – Preliminary Results,”
ICL Technical Report, no. ICL-UT-06-10, January 2006.
(46.19 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147),”
University of Tennessee Computer Science Department Technical Report,, no. UT-CS-00-448, September 2000.
(373.69 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Automated Empirical Tuning of a Multiresolution Analysis Kernel,”
ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.
(120.7 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Autotuning GEMMs for Fermi,”
University of Tennessee Computer Science Technical Report, UT-CS-11-671, (also Lawn 245), April 2011.
(397.45 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
On block-asynchronous execution on GPUs,”
LAPACK Working Note, no. 291, November 2016.
(1.05 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-687 / LAWN 258, November 2011.
(1.08 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
C++ API for Batch BLAS,”
SLATE Working Notes, no. 04, ICL-UT-17-12: University of Tennessee, December 2017.
(1.89 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
C++ API for BLAS and LAPACK,”
SLATE Working Notes, no. 02, ICL-UT-17-03: Innovative Computing Laboratory, University of Tennessee, June 2017.
(1.12 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Case for Directive Programming for Accelerator Autotuner Optimization,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-07: University of Tennessee, October 2017.
(341.52 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
CEED ECP Milestone Report: Improve Performance and Capabilities of CEED-Enabled ECP Applications on Summit/Sierra,”
ECP Milestone Reports: Zenodo, May 2020.
(28.12 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
CEED ECP Milestone Report: Performance Tuning of CEED Software and 1st and 2nd Wave Apps
: Zenodo, October 2019.
(8.31 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
University of Tennessee Computer Science Technical Report, no. UT-CS-07-600 (also LAPACK Working Note 191), January 2007.
(274.74 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
clMAGMA: High Performance Dense Linear Algebra with OpenCL,”
University of Tennessee Technical Report (Lawn 275), no. UT-CS-13-706: University of Tennessee, March 2013.
(526.6 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Communication Avoiding LU with Tournament Pivoting in SLATE,”
SLATE Working Notes, no. 18, ICL-UT-22-01, January 2022.
(3.74 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow Banded Linear Systems II (LAPACK Working Note 143),”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-415, January 1999.
(174.46 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Comparison of Parallel Solvers for General Narrow Banded Linear Systems (LAPACK Working Note 142),”
University of Tennessee Computer Science Technical Report, no. UT-CS-99-414, January 1999.
(304.96 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Computing the Conditioning of the Components of a Linear Least Squares Solution,”
University of Tennessee Computer Science Technical Report, no. UT-CS-07-604, (also LAPACK Working Note 193), January 2007.
(374.97 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Condition Numbers of Gaussian Random Matrices,”
University of Tennessee Computer Science Department Technical Report, vol. –04-539, 00 2005.
(186.46 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Constructing resiliant communication infrastructure for runtime environments,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-09-02, July 2009.
(463.71 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Context Identifier Allocation in Open MPI,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-16-01: Innovative Computing Laboratory, University of Tennessee, January 2016.
(490.89 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
DAGuE: A generic distributed DAG engine for high performance computing,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.
(830.85 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Data Movement Interfaces to Support Dataflow Runtimes,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-03: University of Tennessee, May 2018.
(210.94 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Design and Implementation for FFT-ECP on Distributed Accelerated Systems,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-05: University of Tennessee, April 2019.
(3.19 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Design and Implementation of NetSolve using DCOM as the Remoting Layer,”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-00-440, May 2000.
(65.45 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Design for a Soft Error Resilient Dynamic Task-based Runtime,”
ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014.
(2.61 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-12: University of Tennessee, August 2020.
(476.36 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Designing LU-QR hybrid solvers for performance and stability,”
University of Tennessee Computer Science Technical Report (also LAWN 282), no. ut-eecs-13-719: University of Tennessee, October 2013.
(4.11 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Designing SLATE: Software for Linear Algebra Targeting Exascale,”
SLATE Working Notes, no. 03, ICL-UT-17-06: Innovative Computing Laboratory, University of Tennessee, October 2017.
(2.8 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,”
University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.
(366.26 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Distributed Termination Detection for HPC Task-Based Environments,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-14: University of Tennessee, June 2018.
“Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.
(400.75 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,”
University of Tennessee Computer Science Technical Report, no. ut-cs-13-713, July 2013.
(659.77 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An Effective Empirical Search Method for Automatic Software Tuning,”
ICL Technical Report, no. ICL-UT-05-02, January 2005.
(74.66 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
An efficient distributed randomized solver with application to large dense linear systems,”
ICL Technical Report, no. ICL-UT-12-02, July 2012.
(626.26 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures,”
University of Tennessee Computer Science Technical Report, UT-CS-11-668, (also Lawn 250), June 2011.
(5.93 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Empirical Tuning of a Multiresolution Analysis Kernel using a Specialized Code Generator,”
ICL Technical Report, no. ICL-UT-07-02, January 2007.
(123.34 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)