Publications
Export 57 results:
Filters: Author is Hartwig Anzt [Clear All Filters]
ParILUT - A New Parallel Threshold ILU,”
SIAM Journal on Scientific Computing, vol. 40, issue 4: SIAM, pp. C503–C519, July 2018.
DOI: 10.1137/16M1079506
(19.26 MB)
“
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,”
EuroPar 2012 (also LAWN 260), Rhodes Island, Greece, August 2012.
(662.98 KB)
“
Are we Doing the Right Thing? – A Critical Analysis of the Academic HPC Community,”
2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019.
DOI: 10.1109/IPDPSW.2019.00122
(622.32 KB)
“
Random-Order Alternating Schwarz for Sparse Triangular Solves,”
2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.
(1.53 MB)
“
On the performance and energy efficiency of sparse linear algebra on GPUs,”
International Journal of High Performance Computing Applications, October 2016.
DOI: 10.1177/1094342016672081
(1.19 MB)
“
MAGMA-sparse Interface Design Whitepaper,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.
(1.28 MB)
“
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,”
University of Tennessee Computer Science Technical Report UT-CS-11-690 (also Lawn 260), December 2011.
(662.98 KB)
“
Variable-Size Batched Condition Number Calculation on GPUs,”
SBAC-PAD, Lyon, France, September 2018.
(509.3 KB)
“
Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-σ formats on NVIDIA GPUs,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-14-727: University of Tennessee, April 2014.
(578.11 KB)
“
Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,”
Spring Simulation Multi-Conference 2015 (SpringSim'15), Alexandria, VA, SCS, April 2015.
(1.46 MB)
“
Ginkgo: A High Performance Numerical Linear Algebra Library,”
Journal of Open Source Software, vol. 5, issue 52, August 2020.
DOI: 10.21105/joss.02260
(721.84 KB)
“
Updating Incomplete Factorization Preconditioners for Model Order Reduction,”
Numerical Algorithms, vol. 73, issue 3, no. 3, pp. 611–630, February 2016.
DOI: 10.1007/s11075-016-0110-2
(565.34 KB)
“
Bringing High Performance Computing to Big Data Algorithms,”
Handbook of Big Data Technologies: Springer, 2017.
DOI: 10.1007/978-3-319-49340-4
(1.22 MB)
“
A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
Journal of Parallel and Distributed Computing, vol. 73, issue 12, pp. 1613–1626, December 2013.
DOI: http://dx.doi.org/10.1016/j.jpdc.2013.05.008
(1.08 MB)
“
MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi
, Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.
(2.03 MB)

Iterative Sparse Triangular Solves for Preconditioning,”
EuroPar 2015, Vienna, Austria, Springer Berlin, August 2015.
DOI: 10.1007/978-3-662-48096-0_50
(322.36 KB)
“
Fine-grained Bit-Flip Protection for Relaxation Methods,”
Journal of Computational Science, November 2016.
DOI: 10.1016/j.jocs.2016.11.013
(1.47 MB)
“
Gingko: A Sparse Linear Algebrea Library for HPC
: 2021 ECP Annual Meeting, April 2021.
(893.04 KB)

Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning,”
International Conference on Computational Science (ICCS 2017), vol. 108, Zurich, Switzerland, Procedia Computer Science, pp. 1783-1792, June 2017.
DOI: 10.1016/j.procs.2017.05.186
(512.57 KB)
“
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems
, no. UT-CS-11-689, December 2011.
(608.95 KB)

Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,”
Parallel Computing, vol. 81, pp. 131-146, January 2019.
DOI: 10.1016/j.parco.2017.12.006
(1.9 MB)
“
Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.
(1.83 MB)
“
Flexible Batched Sparse Matrix-Vector Product on GPUs,”
8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '17), Denver, CO, ACM Press, November 2017.
DOI: http://dx.doi.org/10.1145/3148226.3148230
(583.4 KB)
“
GPU-accelerated Co-design of Induced Dimension Reduction: Algorithmic Fusion and Kernel Overlap,”
2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, Austin, TX, ACM, November 2015.
(1.46 MB)
“
“
Flexible Batched Sparse Matrix Vector Product on GPUs
, Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.
(16.8 MB)

Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems,”
Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Best Paper), Rhodes Island, Greece, August 2012.
(764.02 KB)
“
Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers,”
Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, March 2019.
DOI: 10.1002/cpe.4460
(341.54 KB)
“
Experiences in autotuning matrix multiplication for energy minimization on GPUs,”
Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 5096-5113, December 2015.
DOI: 10.1002/cpe.3516
(1.98 MB)
“
On block-asynchronous execution on GPUs,”
LAPACK Working Note, no. 291, November 2016.
(1.05 MB)
“
Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(699 KB)

Preconditioned Krylov Solvers on GPUs,”
Parallel Computing, June 2017.
DOI: 10.1016/j.parco.2017.05.006
(1.19 MB)
“
A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
University of Tennessee Computer Science Technical Report, no. UT-CS-11-687 / LAWN 258, November 2011.
(1.08 MB)
“
Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs,”
The International Journal of High Performance Computing Applications, vol. 32, no. 2, pp. 220–230, March 2018.
DOI: 10.1177/1094342016646844
(2.08 MB)
“
Improving the Energy Efficiency of Sparse Linear System Solvers on Multicore and Manycore Systems,”
Philosophical Transactions of the Royal Society A -- Mathematical, Physical and Engineering Sciences, vol. 372, issue 2018, July 2014.
DOI: 10.1098/rsta.2013.0279
(779.57 KB)
“
Towards a New Peer Review Concept for Scientific Computing ensuring Technical Quality, Software Sustainability, and Result Reproducibility,”
Proceedings in Applied Mathematics and Mechanics, vol. 19, issue 1, November 2019.
DOI: 10.1002/pamm.201900490
“Adaptive Precision Solvers for Sparse Linear Systems,”
3rd International Workshop on Energy Efficient Supercomputing (E2SC '15), Austin, TX, ACM, November 2015.
“Efficiency of General Krylov Methods on GPUs – An Experimental Study,”
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683-691, May 2016.
DOI: 10.1109/IPDPSW.2016.45
“Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning,”
46th International Conference on Parallel Processing (ICPP), Bristol, United Kingdom, IEEE, August 2017.
DOI: 10.1109/ICPP.2017.18
“Weighted Block-Asynchronous Relaxation for GPU-Accelerated Systems,”
SIAM Journal on Computing (submitted), March 2012.
(811.01 KB)
“
Towards Continuous Benchmarking,”
Platform for Advanced Scientific Computing Conference (PASC 2019), Zurich, Switzerland, ACM Press, June 2019.
DOI: 10.1145/3324989.3325719
(1.51 MB)
“
Acceleration of GPU-based Krylov solvers via Data Transfer Reduction,”
International Journal of High Performance Computing Applications, 2015.
“Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs,”
Concurrency and Computation: Practice and Experience, vol. 27, issue 17, pp. 5096 - 5113, Oct 12, 2015.
DOI: 10.1002/cpe.3516
(1.99 MB)
“
Evaluating the Performance of NVIDIA’s A100 Ampere GPU for Sparse and Batched Computations,”
2020 IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS): IEEE, November 2020.
(1.9 MB)
“
Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs,”
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, New York, NY, USA, ACM, pp. 1–10, February 2017.
DOI: 10.1145/3026937.3026940
(552.62 KB)
“
Solver Interface & Performance on Cori,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-05: University of Tennessee, June 2018.
(188.05 KB)
“
Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures,”
VECPAR 2014, Eugene, OR, June 2014.
(430.56 KB)
“
ParILUT – A Parallel Threshold ILU for GPUs,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
DOI: 10.1109/IPDPS.2019.00033
(505.95 KB)
“
Tuning Stationary Iterative Solvers for Fault Resilience,”
6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA15), Austin, TX, ACM, November 2015.
(1.28 MB)
“
Batched Generation of Incomplete Sparse Approximate Inverses on GPUs,”
Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 49–56, November 2016.
DOI: 10.1109/ScalA.2016.11
“