Publications

Export 232 results:
Filters: Author is Stanimire Tomov  [Clear All Filters]
Conference Paper
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization,” IEEE High Performance Extreme Computing Conference (HPEC’18), Waltham, MA, IEEE, September 2018.  (729.87 KB)
Tomov, S., P. Luszczek, I. Yamazaki, J. Dongarra, H. Anzt, and W. Sawyer, Optimizing Krylov Subspace Solvers on Graphics Processing Units,” Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (536.32 KB)
Dong, T., A. Haidar, S. Tomov, and J. Dongarra, Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices,” International Conference on Computational Science (ICCS 2017), Zurich, Switzerland, Procedia Computer Science, June 2017.  (364.95 KB)
Haidar, A., K. Kabir, D. Fayad, S. Tomov, and J. Dongarra, Out of Memory SVD Solver for Big Data,” 2017 IEEE High Performance Extreme Computing Conference (HPEC'17), Waltham, MA, IEEE, September 2017.  (1.33 MB)
Malony, A. D., S. Biersdorff, S. Shende, H. Jagode, S. Tomov, G. Juckeland, R. Dietrich, D. Poole, and C. Lamb, Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs,” International Conference on Parallel Processing (ICPP'11), Taipei, Taiwan, ACM, September 2011. DOI: 10.1109/ICPP.2011.71  (1.41 MB)
Haidar, A., B. Brock, S. Tomov, M. Guidry, J. Jay Billings, D. Shyles, and J. Dongarra, Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations,” 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16), Waltham, MA, IEEE, September 2016.  (480.29 KB)
Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures,” The Spring Simulation Multi-Conference 2015 (SpringSim'15), Best Paper Award, Alexandria, VA, April 2015.  (608.44 KB)
Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, Performance Analysis and Optimization of Two-Sided Factorization Algorithms for Heterogeneous Platform,” International Conference on Computational Science (ICCS 2015), Reykjavík, Iceland, June 2015.  (1.12 MB)
Haidar, A., C. Cao, I. Yamazaki, J. Dongarra, M. Gates, P. Luszczek, and S. Tomov, Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014. DOI: 10.1109/ScalA.2014.8  (407.5 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance, Design, and Autotuning of Batched GEMM for GPUs,” The International Supercomputing Conference (ISC High Performance 2016), Frankfurt, Germany, June 2016.  (1.27 MB)
Mary, T., I. Yamazaki, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs,” International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.  (626.21 KB)
Dongarra, J., M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov, Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” PPAM 2013, Warsaw, Poland, September 2013.  (284.97 KB)
Haidar, A., H. Jagode, A. YarKhan, P. Vaccaro, S. Tomov, and J. Dongarra, Power-aware Computing: Measurement, Control, and Performance Analysis for Intel Xeon Phi,” 2017 IEEE High Performance Extreme Computing Conference (HPEC'17), Best Paper Finalist, Waltham, MA, IEEE, September 2017.  (908.84 KB)
Abdelfattah, A., S. Tomov, and J. Dongarra, Progressive Optimization of Batched LU Factorization on GPUs,” IEEE High Performance Extreme Computing Conference (HPEC’19), Waltham, MA, IEEE, September 2019.  (299.38 KB)
Yamazaki, I., S. Tomov, and J. Dongarra, Sampling Algorithms to Update Truncated SVD,” IEEE International Conference on Big Data, December 2017.
Anzt, H., D. Lukarski, S. Tomov, and J. Dongarra, Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures,” VECPAR 2014, Eugene, OR, June 2014.  (430.56 KB)
Dong, T., V. Dobrev, T. Kolev, R. Rieben, S. Tomov, and J. Dongarra, A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPU-GPU,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.01 MB)
Haidar, A., M. Gates, S. Tomov, and J. Dongarra, Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013. DOI: 10.1145/2464996.2465438  (1.27 MB)
Lopez, M. G., V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, Towards Achieving Performance Portability Using Directives for Accelerators,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016.  (567.02 KB)
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Towards Batched Linear Solvers on Accelerated Hardware Platforms,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8) co-located with PPOPP 2015, San Francisco, CA, ACM, February 2015.  (403.74 KB)
Yamazaki, I., T. Dong, S. Tomov, and J. Dongarra, Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster,” The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 2013.
Haidar, A., C. Cao, J. Dongarra, P. Luszczek, and S. Tomov, Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.51 MB)
Haidar, A., S. Tomov, A. Abdelfattah, M. Zounon, and J. Dongarra, Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption,” ISC High Performance (ISC'18), Best Poster, Frankfurt, Germany, June 2018.  (3.01 MB)
Conference Proceedings
Baboulin, M., S. Donfack, J. Dongarra, L. Grigori, A. Remi, and S. Tomov, A Class of Communication-Avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines,” Proc. of the International Conference on Computational Science (ICCS), vol. 9, pp. 17-26, June 2012.
Horton, M., S. Tomov, and J. Dongarra, A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures,” Symposium for Application Accelerators in High Performance Computing (SAAHPC'11), Knoxville, TN, July 2011.  (329.68 KB)
Tomov, S., J. Langou, A. Canning, L-W. Wang, and J. Dongarra, Comparison of Nonlinear Conjugate-Gradient methods for computing the Electronic Properties of Nanostructure Architectures,” Proceedings of 5th International Conference on Computational Science (ICCS), Atlanta, GA, USA, Springer's Lecture Notes in Computer Science, pp. 317-325, January 2005.  (172.86 KB)
Tomov, S., R. Nath, H. Ltaeif, and J. Dongarra, Dense Linear Algebra Solvers for Multicore with GPU Accelerators,” Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, Atlanta, GA, pp. 1-8, 2010. DOI: 10.1109/IPDPSW.2010.5470941  (1 MB)
Haidar, A., A. Abdelfattah, M. Zounon, P. Wu, S. Pranesh, S. Tomov, and J. Dongarra, The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques,” International Conference on Computational Science (ICCS 2018), vol. 10860, Wuxi, China, Springer, pp. 586–600, June 2018. DOI: 10.1007/978-3-319-93698-7_45
Song, F., S. Tomov, and J. Dongarra, Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems,” 26th ACM International Conference on Supercomputing (ICS 2012), San Servolo Island, Venice, Italy, ACM, June 2012.  (5.88 MB)
Dongarra, J., S. Moore, G. D. Peterson, S. Tomov, J. Allred, V. Natoli, and D. Richie, Exploring New Architectures in Accelerating CFD for Air Force Applications,” Proceedings of the DoD HPCMP User Group Conference, Seattle, Washington, January 2008.  (492.86 KB)
Canning, A., J. Dongarra, J. Langou, O. Marques, S. Tomov, C. Voemel, and L-W. Wang, Interior State Computation of Nano Structures,” PARA 2008, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim, Norway, May 2008.  (137.12 KB)
Haidar, A., S. Tomov, J. Dongarra, R. Solcà, and T. C. Schulthess, Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations,” International Supercomputing Conference (ISC), Lecture Notes in Computer Science, vol. 7905, Leipzig, Germany, Springer Berlin Heidelberg, pp. 67-80, June 2013. DOI: 10.1007/978-3-642-38750-0_6  (2.14 MB)
Du, P., P. Luszczek, S. Tomov, and J. Dongarra, Mixed-Tool Performance Analysis on Hybrid Multicore Architectures,” First International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2010), San Diego, CA, September 2010.  (1.24 MB)
Li, Y., J. Dongarra, and S. Tomov, A Note on Auto-tuning GEMM for GPUs,” 9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009. DOI: 10.1007/978-3-642-01970-8_89  (236.02 KB)
Agullo, E., J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaeif, P. Luszczek, and S. Tomov, Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects,” Journal of Physics: Conference Series, vol. 180, 00 2009.  (119.37 KB)
Yamazaki, I., S. Tomov, and J. Dongarra, One-Sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators,” The International Conference on Computational Science (ICCS), June 2012.
Nath, R., S. Tomov, T. Dong, and J. Dongarra, Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs,” ACM/IEEE Conference on Supercomputing (SC’11), Seattle, WA, November 2011.  (630.63 KB)
Tomov, S., W. Lu, J. Bernholc, S. Moore, and J. Dongarra, Performance Evaluation for Petascale Quantum Simulation Tools,” Proceedings of the Cray Users' Group Meeting, Atlanta, GA, May 2010.
Tomov, S., W. Lu, J. Bernholc, S. Moore, and J. Dongarra, Performance evaluation for petascale quantum simulation tools,” Proceedings of CUG09, Atlanta, GA, May 2009.  (1.09 MB)
Canning, A., J. Dongarra, J. Langou, O. Marques, S. Tomov, C. Voemel, and L-W. Wang, Performance evaluation of eigensolvers in nano-structure computations,” IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.  (120.61 KB)
Agullo, E., C. Augonnet, J. Dongarra, M. Faverge, H. Ltaeif, S. Thibault, and S. Tomov, QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators,” Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.  (468.17 KB)
Baboulin, M., S. Tomov, and J. Dongarra, Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures,” PARA 2008, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim Norway, May 2008.
Canning, A., J. Dongarra, J. Langou, O. Marques, S. Tomov, C. Voemel, and L-W. Wang, Towards bulk based preconditioning for quantum dot computations,” IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.  (172.46 KB)
Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems,” Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Best Paper), Rhodes Island, Greece, August 2012.  (764.02 KB)
Haidar, A., Y. Jia, P. Luszczek, S. Tomov, A. YarKhan, and J. Dongarra, Weighted Dynamic Scheduling with Many Parallelism Grains for Offloading of Numerical Workloads to Multiple Varied Accelerators,” Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA'15), vol. No. 5, Austin, TX, ACM, November 2015.  (347.6 KB)
Journal Article
Nath, R., S. Tomov, and J. Dongarra, Accelerating GPU Kernels for Dense Linear Algebra,” Proc. of VECPAR'10, Berkeley, CA, June 2010.  (615.07 KB)
Baboulin, M., J. Dongarra, J. Herrmann, and S. Tomov, Accelerating Linear System Solutions Using Randomization Techniques,” INRIA RR-7616 / LAWN #246 (presented at International AMMCS’11), Waterloo, Ontario, Canada, July 2011.  (358.79 KB)
Baboulin, M., J. Dongarra, J. Herrmann, and S. Tomov, Accelerating Linear System Solutions Using Randomization Techniques,” ACM Transactions on Mathematical Software (also LAWN 246), vol. 39, issue 2, February 2013. DOI: 10.1145/2427023.2427025  (358.79 KB)

Pages