Publications

Turchenko, V., G. Bosilca, A. Bouteiller, and J. Dongarra, “Efficient Parallelization of Batch Pattern Training Algorithm on Many-core and Cluster Architectures,” 7th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Berlin, Germany, September 2013.

(102.51 KB)

Barry, D., A. Danalis, and H. Jagode, “Effortless Monitoring of Arithmetic Intensity with PAPI's Counter Analysis Toolkit,” 13th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Springer International Publishing, September 2020.

(738.47 KB)

Li, J., G. Bosilca, A. Bouteiller, and B. Nicolae, “Elastic deep learning through resilient collective operations,” SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.

London, K., J. Dongarra, S. Moore, P. Mucci, K. Seymour, and T.. Spencer, “End-user Tools for Application Performance Analysis, Using Hardware Counters,” International Conference on Parallel and Distributed Computing Systems, Dallas, TX, August 2001.

(306.54 KB)

Anzt, H., S. Tomov, and J. Dongarra, “Energy Efficiency and Performance Frontiers for Sparse Computations on GPU Supercomputers,” Sixth International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM '15), San Francisco, CA, ACM, February 2015.

(2.29 MB)

Han, L., Y. Gao, J. Liu, Y. Robert, and F. Vivien, “Energy-Aware Strategies for Reliability-Oriented Real-Time Task Allocation on Heterogeneous Platforms,” 49th International Conference on Parallel Processing (ICPP 2020), Edmonton, AB, Canada, ACM Press, 2020.

(804.96 KB)

Anzt, H., Y. M. Tsai, A. Abdelfattah, T. Cojean, and J. Dongarra, “Evaluating the Performance of NVIDIA’s A100 Ampere GPU for Sparse and Batched Computations,” 2020 IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS): IEEE, November 2020.

(1.9 MB)

Pei, Y., G. Bosilca, I. Yamazaki, A. Ida, and J. Dongarra, “Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization,” PAW-ATM Workshop at SC19, Denver, CO, ACM, November 2019.

(4.51 MB)

Dongarra, J., K. London, S. Moore, P. Mucci, D. Terpstra, H. You, and M. Zhou, “Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters,” PADTAD Workshop, IPDPS 2003, Nice, France, IEEE, April 2003.

(432.57 KB)

Fortenberry, A., and S. Tomov, “Extending MAGMA Portability with OneAPI,” The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), Ninth Workshop on Accelerator Programming Using Directives (WACCPD 2022), Dallas, TX, November 2022.

(999.19 KB)

Cao, Q., Y. Pei, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, “Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications,” Platform for Advanced Scientific Computing Conference (PASC20), Geneva, Switzerland, ACM, June 2020.

(2.71 MB)

Dong, T., A. Haidar, S. Tomov, and J. Dongarra, “A Fast Batched Cholesky Factorization on a GPU,” International Conference on Parallel Processing (ICPP-2014), Minneapolis, MN, September 2014.

(1.37 MB)

Abdelfattah, A., S. Tomov, and J. Dongarra, “Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs,” 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.

(675.5 KB)

Wang, L., W. Wu, J. Zhang, H. Liu, G. Bosilca, M. Herlihy, and R. Fonseca, “FFT-Based Gradient Sparsification for the Distributed Training of Deep Neural Networks,” 9th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 20), Stockholm, Sweden, ACM, June 2020.

(4.72 MB)

Anzt, H., G. Collins, J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Flexible Batched Sparse Matrix-Vector Product on GPUs,” 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '17), Denver, CO, ACM Press, November 2017.

(583.4 KB)

Cao, Q., G. Bosilca, W. Wu, D. Zhong, A. Bouteiller, and J. Dongarra, “Flexible Data Redistribution in a Task-Based Runtime System,” IEEE International Conference on Cluster Computing (Cluster 2020), Kobe, Japan, IEEE, September 2020.

(354.8 KB)

Haidar, A., A. YarKhan, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, “Flexible Linear Algebra Development and Scheduling with Cholesky Factorization,” 17th IEEE International Conference on High Performance Computing and Communications, Newark, NJ, August 2015.

(494.31 KB)

Haidar, A., T. Dong, S. Tomov, P. Luszczek, and J. Dongarra, “Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,” ISC High Performance, Frankfurt, Germany, Springer, July 2015.

(778.26 KB)

Cao, Q., R. Alomairy, Y. Pei, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, “A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 2022.

(1.03 MB)

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, and J. Dongarra, “From Serial Loops to Parallel Execution on Distributed Systems,” International European Conference on Parallel and Distributed Computing (Euro-Par '12), Rhodes, Greece, August 2012.

(203.08 KB)

Schuchart, J., P. Nookala, M. Mahdi Javanmard, T. Herault, E. F. Valeev, G. Bosilca, and R. J. Harrison, “Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment,” 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, IEEE, July 2022.

Han, L., V. Le Fèvre, L-C. Canon, Y. Robert, and F. Vivien, “A Generic Approach to Scheduling and Checkpointing Workflows,” The 47th International Conference on Parallel Processing (ICPP 2018), Eugene, OR, IEEE Computer Society Press, August 2018.

(737.11 KB)

Herault, T., Y. Robert, G. Bosilca, and J. Dongarra, “Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC,” ScalA'19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019.

(260.69 KB)

Patinyasakdikul, T., D. Eberius, G. Bosilca, and N. Hjelm, “Give MPI Threading a Fair Chance: A Study of Multithreaded MPI Designs,” IEEE Cluster, Albuquerque, NM, IEEE, September 2019.

(220.84 KB)

Anzt, H., E. Ponce, G. D. Peterson, and J. Dongarra, “GPU-accelerated Co-design of Induced Dimension Reduction: Algorithmic Fusion and Kernel Overlap,” 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, Austin, TX, ACM, November 2015.

(1.46 MB)

Wu, W., G. Bosilca, R. vandeVaart, S. Jeaugey, and J. Dongarra, “GPU-Aware Non-contiguous Data Movement In Open MPI,” 25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, ACM, June 2016.

(482.32 KB)

Abdelfattah, A., S. Tomov, P. Luszczek, H. Anzt, and J. Dongarra, “GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure,” SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.

Luo, X., W. Wu, G. Bosilca, Y. Pei, Q. Cao, T. Patinyasakdikul, D. Zhong, and J. Dongarra, “HAN: A Hierarchical AutotuNed Collective Communication Framework,” IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020.

(764.05 KB)

Wong, K., S. Tomov, and J. Dongarra, “Hands-on Research and Training in High-Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.

(1016.52 KB)

Haidar, A., S. Tomov, J. Dongarra, and N. J. Higham, “Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers,” The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, TX, IEEE, November 2018.

(642.51 KB)

Ayala, A., S. Tomov, A. Haidar, and J. Dongarra, “heFFTe: Highly Efficient FFT for Exascale,” International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020.

(2.62 MB)

Jia, Y., P. Luszczek, and J. Dongarra, “Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures,” 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.

(535.72 KB)

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Heterogeneous Acceleration for Linear Algebra in Mulit-Coprocessor Environments,” VECPAR 2014, Eugene, OR, June 2014.

(276.52 KB)

Newburn, C. J., G. Bansal, M. Wood, L. Crivelli, J. Planas, A. Duran, P. Souza, L. Borges, P. Luszczek, S. Tomov, et al., “Heterogeneous Streaming,” The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.

(2.73 MB)

Wu, W., A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, “Hierarchical DAG scheduling for Hybrid Distributed Systems,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.

(1.11 MB)

Beams, N., A. Abdelfattah, S. Tomov, J. Dongarra, T. Kolev, and Y. Dudouit, “High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs,” 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA): IEEE, November 2020.

(1.3 MB)

Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, “High-performance Cholesky Factorization for GPU-only Execution,” Proceedings of the General Purpose GPUs (GPGPU-10), Austin, TX, ACM, February 2017.

(872.18 KB)

Anzt, H., T. Gruetzmacher, E. S. Quintana-Orti, and F. Scheidegger, “High-Performance GPU Implementation of PageRank with Reduced Precision based on Mantissa Segmentation,” 8th Workshop on Irregular Applications: Architectures and Algorithms, 2018.

Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, J. Falcou, and J. Dongarra, “High-performance Matrix-matrix Multiplications of Very Small Matrices,” 22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16), Grenoble, France, Springer International Publishing, August 2016.

Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., “High-Performance Tensor Contractions for GPUs,” International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.

(2.36 MB)

Lukarski, D., H. Anzt, S. Tomov, and J. Dongarra, “Hybrid Multi-Elimination ILU Preconditioners on GPUs,” International Heterogeneity in Computing Workshop (HCW), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(1.67 MB)

Benoit, A., F. Cappello, A. Cavelan, Y. Robert, and H. Sun, “Identifying the Right Replication Level to Detect and Correct Silent Errors at Scale,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, ACM, June 2017.

(865.68 KB)

Ayala, A., S. Tomov, X. Luo, H. Shaiek, A. Haidar, G. Bosilca, and J. Dongarra, “Impacts of Multi-GPU MPI Collective Communications on Large FFT Computation,” Workshop on Exascale MPI (ExaMPI) at SC19, Denver, CO, November 2019.

(1.6 MB)

Bouteiller, A., and G. Bosilca, “Implicit Actions and Non-blocking Failure Recovery with MPI,” 2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), Dallas, TX, USA, IEEE, January 2023, 2022.

Han, L., L-C. Canon, J. Liu, Y. Robert, and F. Vivien, “Improved Energy-Aware Strategies for Periodic Real-Time Tasks under Reliability Constraints,” 40th IEEE Real-Time Systems Symposium (RTSS 2019), York, UK, IEEE Press, February 2020.

Haidar, A., P. Luszczek, J. Kurzak, and J. Dongarra, “An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,” Supercomputing 2013, Denver, CO, November 2013.

Yamazaki, I., H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra, “Improving the performance of CA-GMRES on multicores with multiple GPUs,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(333.82 KB)

Lindquist, N., P. Luszczek, and J. Dongarra, “Improving the Performance of the GMRES Method using Mixed-Precision Techniques,” Smoky Mountains Computational Sciences & Engineering Conference (SMC2020), August 2020.

(600.33 KB)

Mor, O., G. Bosilca, and M. Snir, “Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine,” 52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, September 2023.

Luszczek, P., I. Yamazaki, and J. Dongarra, “Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators,” IEEE High Performance Extreme Computing Conference (HPEC 2019), Best Paper Finalist, Waltham, MA, IEEE, September 2019.

(470.21 KB)

Main menu

Pages