Publications
Export 8 results:
Filters: First Letter Of Title is H and Author is Ahmad Abdelfattah [Clear All Filters]
Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers and Achieve 74 Gflops/Watt on Nvidia V100
, San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.
(2.96 MB)
High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs,”
2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA): IEEE, November 2020.
(1.3 MB)
“High-performance Cholesky Factorization for GPU-only Execution,”
Proceedings of the General Purpose GPUs (GPGPU-10), Austin, TX, ACM, February 2017.
(872.18 KB)
“High-performance Matrix-matrix Multiplications of Very Small Matrices,”
22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16), Grenoble, France, Springer International Publishing, August 2016.
“High-Performance Tensor Contractions for GPUs,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-16-738: University of Tennessee, January 2016.
(2.36 MB)
“High-Performance Tensor Contractions for GPUs,”
International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.
(2.36 MB)
“hipMAGMA v1.0
: Zenodo, March 2020.
hipMAGMA v2.0
: Zenodo, July 2020.