Publications

Export 1276 results:
Conference Paper
Gainaru, A., B. Goglin, V. Honoré, P. Raghavan, G. Pallez, P. Raghavan, Y. Robert, and H. Sun, Reservation and Checkpointing Strategies for Stochastic Jobs,” 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2020), New Orleans, LA, IEEE Computer Society Press, May 2020.  (692.4 KB)
Benoit, A., T. Herault, V. Le Fèvre, and Y. Robert, Replication is More Efficient Than You Think,” The IEEE/ACM Conference on High Performance Computing Networking, Storage and Analysis (SC19), Denver, CO, ACM Press, November 2019.  (975.69 KB)
Lindquist, N., P. Luszczek, and J. Dongarra, Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques,” 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), Atlanta, GA, IEEE, November 2020.  (184.6 KB)
Cao, Q., S. Abdulah, H. Ltaief, M. G. Genton, D. Keyes, and G. Bosilca, Reducing Data Motion and Energy Consumption of Geospatial Modeling Applications Using Automated Precision Conversion,” 2023 IEEE International Conference on Cluster Computing (CLUSTER), Santa Fe, NM, USA, IEEE, November 2023.
Bouteiller, A., T. Ropars, G. Bosilca, C. Morin, and J. Dongarra, Reasons for a Pessimistic or Optimistic Message Logging Protocol in MPI Uncoordinated Failure Recovery,” CLUSTER '09, New Orleans, IEEE, August 2009.  (191.36 KB)
Anzt, H., E. Chow, D. Szyld, and J. Dongarra, Random-Order Alternating Schwarz for Sparse Triangular Solves,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.  (1.53 MB)
Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, Randomized Algorithms to Update Partial Singular Value Decomposition on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
Schuchart, J., C. Niethammer, J. Gracia, and G. Bosilca, Quo Vadis MPI RMA? Towards a More Efficient Use of MPI One-Sided Communication,” EuroMPI'21, Garching, Munich Germany, 2021.  (835.27 KB)
Nance, D., S. Tomov, and K. Wong, A Python Library for Matrix Algebra on GPU and Multicore Architectures,” 2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS), Denver, CO, IEEE, December 2022.  (414.36 KB)
Schuchart, J., P. Nookala, T. Herault, E. F. Valeev, and G. Bosilca, Pushing the Boundaries of Small Tasks: Scalable Low-Overhead Data-Flow Programming in TTG,” 2022 IEEE International Conference on Cluster Computing (CLUSTER), Heidelberg, Germany, IEEE, September 2022.
Danalis, A., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, PTG: An Abstraction for Unhindered Parallelism,” International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.  (480.05 KB)
Abdelfattah, A., S. Tomov, and J. Dongarra, Progressive Optimization of Batched LU Factorization on GPUs,” IEEE High Performance Extreme Computing Conference (HPEC’19), Waltham, MA, IEEE, September 2019.  (299.38 KB)
Hunold, S., A. Bhatele, G. Bosilca, and P. Knees, Predicting MPI Collective Communication Performance Using Machine Learning,” 2020 IEEE International Conference on Cluster Computing (CLUSTER), Kobe, Japan, IEEE, September 2020.  (619.68 KB)
Aggarwal, I., P. Nayak, A. Kashi, and H. Anzt, Preconditioners for Batched Iterative Linear Solvers on GPUs,” Smoky Mountains Computational Sciences and Engineering Conference, vol. 169075: Springer Nature Switzerland, pp. 38 - 53, January 2023.
Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.  (550.96 KB)
Haidar, A., H. Jagode, A. YarKhan, P. Vaccaro, S. Tomov, and J. Dongarra, Power-aware Computing: Measurement, Control, and Performance Analysis for Intel Xeon Phi,” 2017 IEEE High Performance Extreme Computing Conference (HPEC'17), Best Paper Finalist, Waltham, MA, IEEE, September 2017.  (908.84 KB)
McCraw, H., J. Ralph, A. Danalis, and J. Dongarra, Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models,” 2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-04, Madrid, Spain, IEEE, September 2014.  (3.45 MB)
Dongarra, J., M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov, Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” PPAM 2013, Warsaw, Poland, September 2013.  (284.97 KB)
Bouteiller, A., G. Bosilca, and J. Dongarra, Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery,” 22nd European MPI Users' Group Meeting, Bordeaux, France, ACM, September 2015.  (543.32 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs,” International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.  (626.21 KB)
Moore, S., D. Cronk, F. Wolf, A. Purkayastha, P. J. Teller, R. Araiza, G. Aguilera, and J. Nava, Performance Profiling and Analysis of DoD Applications using PAPI and TAU,” Proceedings of DoD HPCMP UGC 2005, Nashville, TN, IEEE, June 2005.  (322.56 KB)
Mary, T., I. Yamazaki, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
Benoit, A., S. Perarnau, L. Pottier, and Y. Robert, A Performance Model to Execute Workflows on High-Bandwidth Memory Architectures,” The 47th International Conference on Parallel Processing (ICPP 2018), Eugene, OR, IEEE Computer Society Press, August 2018.  (868.44 KB)
Dongarra, J., A. D. Malony, S. Moore, P. Mucci, and S. Shende, Performance Instrumentation and Measurement for Terascale Systems,” ICCS 2003 Terascale Workshop, Melbourne, Australia, Springer, Berlin, Heidelberg, June 2003.  (5.36 MB)
Mishler, D., J. Ciesko, S. Olivier, and G. Bosilca, Performance Insights into Device-initiated RMA Using Kokkos Remote Spaces,” 2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), Santa Fe, NM, USA, IEEE, November 2023.
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance, Design, and Autotuning of Batched GEMM for GPUs,” The International Supercomputing Conference (ISC High Performance 2016), Frankfurt, Germany, June 2016.  (1.27 MB)
Haidar, A., C. Cao, I. Yamazaki, J. Dongarra, M. Gates, P. Luszczek, and S. Tomov, Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014.  (407.5 KB)
Cao, Q., Y. Pei, T. Herault, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools,” Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19, Denver, CO, ACM, November 2019.  (429.55 KB)
Ayala, A., S. Tomov, M. Stoyanov, A. Haidar, and J. Dongarra, Performance Analysis of Parallel FFT on Large Multi-GPU Systems,” 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lyon, France, IEEE, August 2022.
Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, Performance Analysis and Optimization of Two-Sided Factorization Algorithms for Heterogeneous Platform,” International Conference on Computational Science (ICCS 2015), Reykjavík, Iceland, June 2015.  (1.12 MB)
Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures,” The Spring Simulation Multi-Conference 2015 (SpringSim'15), Best Paper Award, Alexandria, VA, April 2015.  (608.44 KB)
Haidar, A., B. Brock, S. Tomov, M. Guidry, J. Jay Billings, D. Shyles, and J. Dongarra, Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations,” 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16), Waltham, MA, IEEE, September 2016.  (480.29 KB)
Mucci, P., D. Ahlin, J. Danielsson, P. Ekman, and L. Malinowski, PerfMiner: Cluster-Wide Collection, Storage and Presentation of Application Level Hardware Performance Data,” European Conference on Parallel Processing (Euro-Par 2005), Monte de Caparica, Portugal, Springer, September 2005.  (205.45 KB)
Danalis, A., H. Jagode, G. Bosilca, and J. Dongarra, PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution,” 2015 IEEE International Conference on Cluster Computing, Chicago, IL, IEEE, September 2015.  (1.77 MB)
Anzt, H., T. Ribizel, G. Flegar, E. Chow, and J. Dongarra, ParILUT – A Parallel Threshold ILU for GPUs,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.  (505.95 KB)
Ribizel, T., and H. Anzt, Parallel Symbolic Cholesky Factorization,” SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
Wang, Y., M. Baboulin, J. Falcou, Y. Fraigneau, and O. Le Maître, A Parallel Solver for Incompressible Fluid Flows,” International Conference on Computational Science (ICCS 2013), Barcelona, Spain, Elsevier B.V., June 2013.  (588.79 KB)
Jia, Y., G. Bosilca, P. Luszczek, and J. Dongarra, Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance,” International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE-SC 2013, Denver, CO, November 2013.  (147.09 KB)
Malony, A. D., S. Biersdorff, S. Shende, H. Jagode, S. Tomov, G. Juckeland, R. Dietrich, D. Poole, and C. Lamb, Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs,” International Conference on Parallel Processing (ICPP'11), Taipei, Taiwan, ACM, September 2011.  (1.41 MB)
Sid-Lakhdar, W., S. Cayrols, D. Bielich, A. Abdelfattah, P. Luszczek, M. Gates, S. Tomov, H. Johansen, D. Williams-Young, T. Davis, et al., PAQR: Pivoting Avoiding QR factorization,” 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, FL, USA, IEEE, 2023.
London, K., S. Moore, P. Mucci, K. Seymour, and R. Luczak, The PAPI Cross-Platform Interface to Hardware Performance Counters,” Department of Defense Users' Group Conference Proceedings, Biloxi, Mississippi, June 2001.  (328.56 KB)
Haidar, A., K. Kabir, D. Fayad, S. Tomov, and J. Dongarra, Out of Memory SVD Solver for Big Data,” 2017 IEEE High Performance Extreme Computing Conference (HPEC'17), Waltham, MA, IEEE, September 2017.  (1.33 MB)
Dong, T., A. Haidar, S. Tomov, and J. Dongarra, Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices,” International Conference on Computational Science (ICCS 2017), Zurich, Switzerland, Procedia Computer Science, June 2017.  (364.95 KB)
Tomov, S., P. Luszczek, I. Yamazaki, J. Dongarra, H. Anzt, and W. Sawyer, Optimizing Krylov Subspace Solvers on Graphics Processing Units,” Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (536.32 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization,” IEEE High Performance Extreme Computing Conference (HPEC’18), Waltham, MA, IEEE, September 2018.  (729.87 KB)
Dongarra, J., S. Hammarling, N. J. Higham, S. Relton, and M. Zounon, Optimized Batched Linear Algebra for Modern Architectures,” Euro-Par 2017, Santiago de Compostela, Spain, Springer, August 2017.  (618.33 KB)
Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, Optimization for Performance and Energy for Batched Matrix Computations on GPUs,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8), San Francisco, CA, ACM, February 2015.  (699.5 KB)
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Optimal Resilience Patterns to Cope with Fail-stop and Silent Errors,” 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.  (603.58 KB)
Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms,” 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Best Paper Award, Vancouver, BC, Canada, IEEE, May 2018.  (899.3 KB)
Benoit, A., A. Cavelan, V. Le Fèvre, and Y. Robert, Optimal Checkpointing Period with replicated execution on heterogeneous platforms,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, IEEE Computer Society Press, June 2017.  (1.02 MB)

Pages