Publications

Show only items where

Author

Type

Term

Year

Keyword

Export 292 results:

Filters: Author is Stan Tomov [Clear All Filters]

Conference Paper

Lukarski, D., H. Anzt, S. Tomov, and J. Dongarra, “Hybrid Multi-Elimination ILU Preconditioners on GPUs,” International Heterogeneity in Computing Workshop (HCW), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(1.67 MB)

Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., “High-Performance Tensor Contractions for GPUs,” International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.

(2.36 MB)

Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, J. Falcou, and J. Dongarra, “High-performance Matrix-matrix Multiplications of Very Small Matrices,” 22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16), Grenoble, France, Springer International Publishing, August 2016.

Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, “High-performance Cholesky Factorization for GPU-only Execution,” Proceedings of the General Purpose GPUs (GPGPU-10), Austin, TX, ACM, February 2017.

(872.18 KB)

Beams, N., A. Abdelfattah, S. Tomov, J. Dongarra, T. Kolev, and Y. Dudouit, “High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs,” 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA): IEEE, November 2020.

(1.3 MB)

Newburn, C. J., G. Bansal, M. Wood, L. Crivelli, J. Planas, A. Duran, P. Souza, L. Borges, P. Luszczek, S. Tomov, et al., “Heterogeneous Streaming,” The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.

(2.73 MB)

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Heterogeneous Acceleration for Linear Algebra in Mulit-Coprocessor Environments,” VECPAR 2014, Eugene, OR, June 2014.

(276.52 KB)

Ayala, A., S. Tomov, A. Haidar, and J. Dongarra, “heFFTe: Highly Efficient FFT for Exascale,” International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020.

(2.62 MB)

Haidar, A., S. Tomov, J. Dongarra, and N. J. Higham, “Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers,” The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, TX, IEEE, November 2018.

(642.51 KB)

Wong, K., S. Tomov, and J. Dongarra, “Hands-on Research and Training in High-Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.

(1016.52 KB)

Haidar, A., T. Dong, S. Tomov, P. Luszczek, and J. Dongarra, “Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,” ISC High Performance, Frankfurt, Germany, Springer, July 2015.

(778.26 KB)

Haidar, A., A. YarKhan, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, “Flexible Linear Algebra Development and Scheduling with Cholesky Factorization,” 17th IEEE International Conference on High Performance Computing and Communications, Newark, NJ, August 2015.

(494.31 KB)

Abdelfattah, A., S. Tomov, and J. Dongarra, “Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs,” 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.

(675.5 KB)

Dong, T., A. Haidar, S. Tomov, and J. Dongarra, “A Fast Batched Cholesky Factorization on a GPU,” International Conference on Parallel Processing (ICPP-2014), Minneapolis, MN, September 2014.

(1.37 MB)

Fortenberry, A., and S. Tomov, “Extending MAGMA Portability with OneAPI,” The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), Ninth Workshop on Accelerator Programming Using Directives (WACCPD 2022), Dallas, TX, November 2022.

(999.19 KB)

Anzt, H., S. Tomov, and J. Dongarra, “Energy Efficiency and Performance Frontiers for Sparse Computations on GPU Supercomputers,” Sixth International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM '15), San Francisco, CA, ACM, February 2015.

(2.29 MB)

Solcà, R., A. Kozhevnikov, A. Haidar, S. Tomov, T. C. Schulthess, and J. Dongarra, “Efficient Implementation Of Quantum Materials Simulations On Distributed CPU-GPU Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

(1.09 MB)

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Efficient Eigensolver Algorithms on Accelerator Based Architectures,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.

(6.98 MB)

Donfack, S., S. Tomov, and J. Dongarra, “Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,” Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, May 2014.

(490.08 KB)

Yamazaki, I., S. Rajamanickam, E. G. Boman, M. Hoemmen, M. A. Heroux, and S. Tomov, “Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, November 2014.

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures,” The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016.

(708.62 KB)

Brown, C., A. Abdelfattah, S. Tomov, and J. Dongarra, “Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs,” 2020 IEEE High Performance Extreme Computing Virtual Conference: IEEE, September 2020.

(476.36 KB)

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors,” ISC High Performance 2015, Frankfurt, Germany, July 2015.

(1.49 MB)

Yamazaki, I., S. Tomov, and J. Dongarra, “Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, New Orleans, LA, November 2014.

(465.52 KB)

Gates, M., S. Tomov, and A. Haidar, “Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.

(4.7 MB)

Cao, C., J. Dongarra, P. Du, M. Gates, P. Luszczek, and S. Tomov, “clMAGMA: High Performance Dense Linear Algebra with OpenCL ,” International Workshop on OpenCL, Bristol University, England, May 2014.

(460.91 KB)

YarKhan, A., A. Haidar, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, “Cholesky Across Accelerators,” 17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Elizabeth, NJ, IEEE, August 2015.

Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, “Batched Matrix Computations on Hardware Accelerators Based on GPUs,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.

(9.36 MB)

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Batched Matrix Computations on Hardware Accelerators,” EuroMPI/Asia 2015 Workshop, Bordeaux, France, September 2015.

(589.05 KB)

Lopez, F., E. Chow, S. Tomov, and J. Dongarra, “Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,” Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020.

(188.51 KB)

Yamazaki, I., A. Abdelfattah, A. Ida, S. Ohshima, S. Tomov, R. Yokota, and J. Dongarra, “Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU Clusters,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada, IEEE, May 2018.

(1.37 MB)

Yamazaki, I., T. Mary, J. Kurzak, S. Tomov, and J. Dongarra, “Access-averse Framework for Computing Low-rank Matrix Approximations,” First International Workshop on High Performance Big Graph Data Management, Analysis, and Mining, Washington, DC, October 2014.

Anzt, H., S. Tomov, and J. Dongarra, “Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” Spring Simulation Multi-Conference 2015 (SpringSim'15), Alexandria, VA, SCS, April 2015.

(1.46 MB)

Book Chapter

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, “Scalable Dense Linear Algebra on Heterogeneous Hardware,” HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, 2013.

(760.32 KB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” High Performance Computing: 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, no. 9697: Springer International Publishing, pp. 21–38, 2016.

(1.98 MB)

Vetter, J., R. Glassbrook, K. Schwan, S. Yalamanchili, M. Horton, A. Gavrilovska, M. Slawinska, J. Dongarra, J. Meredith, P. Roth, et al., “Keeneland: Computational Science Using Heterogeneous GPU Computing,” Contemporary High Performance Computing: From Petascale Toward Exascale, Boca Raton, FL, Taylor and Francis, 2013.

(2.7 MB)

Baboulin, M., J. Dongarra, A. Remy, S. Tomov, and I. Yamazaki, “Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures,” Lecture Notes in Computer Science, vol. 9573: Springer International Publishing, pp. 86-95, September 2015, 2016.

(327.14 KB)

Tomov, S., and J. Dongarra, “Dense Linear Algebra for Hybrid GPU-based Systems,” Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.

Anzt, H., J. Dongarra, M. Gates, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, “Bringing High Performance Computing to Big Data Algorithms,” Handbook of Big Data Technologies: Springer, 2017.

(1.22 MB)

Nath, R., S. Tomov, and J. Dongarra, “Blas for GPUs,” Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.

(1.05 MB)

Abdelfattah, A., S. Tomov, and J. Dongarra, “Batch QR Factorization on GPUs: Design, Optimization, and Tuning,” Lecture Notes in Computer Science, vol. 13350, Cham, Springer International Publishing, June 2022.

Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, “Accelerating Numerical Dense Linear Algebra Calculations with GPUs,” Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014.

(1.06 MB)

Main menu

Publications

Pages