Publications
Export 62 results:
Filters: Author is Anzt, Hartwig [Clear All Filters]
Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning,”
International Conference on Computational Science (ICCS 2017), vol. 108, Zurich, Switzerland, Procedia Computer Science, pp. 1783-1792, June 2017.
(512.57 KB)
“Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems
, no. UT-CS-11-689, December 2011.
(608.95 KB)
Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,”
Parallel Computing, vol. 81, pp. 131-146, January 2019.
(1.9 MB)
“Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.
(1.83 MB)
“GPU-accelerated Co-design of Induced Dimension Reduction: Algorithmic Fusion and Kernel Overlap,”
2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, Austin, TX, ACM, November 2015.
(1.46 MB)
“ParILUT – A Parallel Threshold ILU for GPUs,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
(505.95 KB)
“
“AI Benchmarking for Science: Efforts from the MLCommons Science Working Group,”
Lecture Notes in Computer Science, vol. 13387: Springer International Publishing, pp. 47 - 64, January 2023.
“Flexible Batched Sparse Matrix Vector Product on GPUs
, Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.
(16.8 MB)
Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems,”
Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Best Paper), Rhodes Island, Greece, August 2012.
(764.02 KB)
“Toward a Modular Precision Ecosystem for High-Performance Computing,”
The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1069-1078, November 2019.
(1.93 MB)
“Experiences in autotuning matrix multiplication for energy minimization on GPUs,”
Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 5096-5113, December 2015.
(1.98 MB)
“