Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers and Achieve 74 Gflops/Watt on Nvidia V100

TitleHarnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers and Achieve 74 Gflops/Watt on Nvidia V100
Publication TypePoster
Year of Publication2018
AuthorsHaidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra
Date Published03-2018
EventGPU Technology Conference (GTC), Poster
Event LocationSan Jose, CA
Project Tags: 
External Publication Flag: