Publications
Export 14 results:
Filters: First Letter Of Title is F and Author is Stanimire Tomov [Clear All Filters]
FFT Benchmark Performance Experiments on Systems Targeting Exascale,”
ICL Technical Report, no. ICL-UT-22-02, March 2022.
(5.87 MB)
“FFT-ECP API and High-Performance Library Prototype for 2-D and 3-D FFTs on Large-Scale Heterogeneous Systems with GPUs,”
ECP Milestone Report, no. FFT-ECP STML13-27: Innovative Computing Laboratory, University of Tennessee, January 2020.
(9.71 MB)
“Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs,”
33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
(675.5 KB)
“FFT-ECP Fast Fourier Transform
, Houston, TX, 2019 ECP Annual Meeting (Research Poster), January 2019.
(1.51 MB)
FFT-ECP Implementation Optimizations and Features Phase,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-12: University of Tennessee, October 2019.
(4.14 MB)
“Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures,”
Procedia Computer Science, vol. 108, pp. 606–615, June 2017.
(643.44 KB)
“Fast Cholesky Factorization on GPUs for Batch and Native Modes in MAGMA,”
Journal of Computational Science, vol. 20, pp. 85–93, May 2017.
(3.6 MB)
“A Framework for Out of Memory SVD Algorithms,”
ISC High Performance 2017, pp. 158–178, June 2017.
(393.22 KB)
“Flexible Linear Algebra Development and Scheduling with Cholesky Factorization,”
17th IEEE International Conference on High Performance Computing and Communications, Newark, NJ, August 2015.
(494.31 KB)
“Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,”
ISC High Performance, Frankfurt, Germany, Springer, July 2015.
(778.26 KB)
“A Fast Batched Cholesky Factorization on a GPU,”
International Conference on Parallel Processing (ICPP-2014), Minneapolis, MN, September 2014.
(1.37 MB)
“From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming,”
Parallel Computing, vol. 38, no. 8, pp. 391-407, August 2012.
(1.64 MB)
“The Future of Computing: Software Libraries
, Savannah, GA, DOD CREATE Developers' Review, Keynote Presentation, February 2012.
(6.76 MB)
Faster, Cheaper, Better - A Hybridization Methodology to Develop Linear Algebra Software for GPUs,”
LAPACK Working Note, no. 230, 00 2010.
(334.48 KB)
“