Publications

Yamazaki, I., S. Tomov, and J. Dongarra, “Computing Low-rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and its Application to Solving a Hierarchically Semiseparable Linear System of Equations,” Scientific Programming, 2015.

(648.87 KB)

Kaya, O., and Y. Robert, “Computing Dense Tensor Decompositions with Optimal Dimension Trees,” Algorithmica, vol. 81, issue 5, pp. 2092–2121, May 2019.

(638.4 KB)

“Computational Science – ICCS 2009, Proceedings of the 9th International Conference,” Lecture Notes in Computer Science: Theoretical Computer Science and General Issues, vol. -, no. 5544-5545, Baton Rouge, LA, May 2009.

Sloot, P. M., D. Abramson, A. V. Bogdanov, J. Dongarra, A. Zomaya, and Y. Gorbachev, “Computational Science — ICCS 2003,” Lecture Notes in Computer Science, vol. 2657-2660, ICCS 2003, International Conference. Melbourne, Australia, Springer-Verlag, Berlin, June 2003.

Kovalchuk, S. V., V. V. Krzhizhanovskaya, M. Paszyński, D. Kranzlmüller, J. Dongarra, and P. M. A. Sloot, “Computational science for a better future,” Journal of Computational Science, vol. 62, pp. 101745, July 2022.

Sun, J., J. Fu, J. Drake, Q. Zhu, A. Haidar, M. Gates, S. Tomov, and J. Dongarra, “Computational Benefit of GPU Optimization for Atmospheric Chemistry Modeling,” Journal of Advances in Modeling Earth Systems, vol. 10, issue 8, pp. 1952–1969, August 2018.

(3.4 MB)

Aliaga, J. I., H. Anzt, T. Grützmacher, E. S. Quintana-Orti, and A. E. Thomas, “Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units,” Concurrency and Computation: Practice and Experience, vol. 34, issue 14, June 2022.

(749.82 KB)

Aliaga, J. I., H. Anzt, T. Grützmacher, E. S. Quintana-Ortí, and A. E. Thomas, “Compressed basis GMRES on high-performance graphics processing units,” The International Journal of High Performance Computing Applications, May 2022.

(13.52 MB)

Haidar, A., H. Ltaeif, P. Luszczek, and J. Dongarra, “A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction,” IPDPS 2012, Shanghai, China, May 2012.

(480.43 KB)

Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, “Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing,” International Journal of Networking and Computing, vol. 5, no. 1, pp. 2-15, January 2015.

(755.54 KB)

Eijkhout, V., E. Fuentes, T. Eidson, and J. Dongarra, “The Component Structure of a Self-Adapting Numerical Software System,” International Journal of Parallel Programming, vol. 33, no. 2, June 2005.

(64.88 KB)

Arbenz, P., A. Cleary, J. Dongarra, and M. Hegland, “A Comparison of Parallel Solvers for General Narrow Banded Linear Systems,” Parallel and Distributed Computing Practices, vol. 2, pp. 385-400, October 2002.

(304.96 KB)

Graham, R. L., G. Bosilca, and J. Pjesivac–Grbovic, “A Comparison of Application Performance Using Open MPI and Cray MPI,” Cray User Group, CUG 2007, May 2007.

(248.83 KB)

Le Fèvre, V., T. Herault, Y. Robert, A. Bouteiller, A. Hori, G. Bosilca, and J. Dongarra, “Comparing the Performance of Rigid, Moldable, and Grid-Shaped Applications on Failure-Prone HPC Platforms,” Parallel Computing, vol. 85, pp. 1–12, July 2019.

(865.18 KB)

Bosilca, G., A. Bouteiller, T. Herault, V. Le Fèvre, Y. Robert, and J. Dongarra, “Comparing Distributed Termination Detection Algorithms for Modern HPC Platforms,” International Journal of Networking and Computing, vol. 12, issue 1, pp. 26 - 46, January 2022.

Ballard, G., D. Becker, J. Demmel, J. Dongarra, A. Druinsky, I. Peled, O. Schwartz, S. Toledo, and I. Yamazaki, “Communication-Avoiding Symmetric-Indefinite Factorization,” SIAM Journal on Matrix Analysis and Application, vol. 35, issue 4, pp. 1364-1406, July 2014.

(593.18 KB)

Luszczek, P., W. M. Sid-Lakhdar, and J. Dongarra, “Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering,” The International Journal of High Performance Computing Applications, March 2023.

Benoit, A., A. Cavelan, F. M. Ciorba, V. Le Fèvre, and Y. Robert, “Combining Checkpointing and Replication for Reliable Execution of Linear Workflows with Fail-Stop and Silent Errors,” International Journal of Networking and Computing, vol. 9, no. 1, pp. 2-27.

(754.6 KB)

Terpstra, D., H. Jagode, H. You, and J. Dongarra, “Collecting Performance Data with PAPI-C,” Tools for High Performance Computing 2009, 3rd Parallel Tools Workshop, Dresden, Germany, Springer Berlin / Heidelberg, pp. 157-173, May 2010.

(4.45 MB)

Buttari, A., J. Langou, J. Kurzak, and J. Dongarra, “A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,” Parallel Computing (to appear), 00 2010.

(612.23 KB)

Buttari, A., J. Langou, J. Kurzak, and J. Dongarra, “A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,” Parallel Computing, vol. 35, pp. 38-53, 00 2009.

(274.74 KB)

Han, L., L-C. Canon, H. Casanova, Y. Robert, and F. Vivien, “Checkpointing Workflows for Fail-Stop Errors,” IEEE Transactions on Computers, vol. 67, issue 8, pp. 1105–1120, August 2018.

Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, “Checkpointing Strategies for Shared High-Performance Computing Platforms,” International Journal of Networking and Computing, vol. 9, no. 1, pp. 28–52, 2019.

(490.5 KB)

Luszczek, P., J. Kurzak, and J. Dongarra, “Changes in Dense Linear Algebra Kernels - Decades Long Perspective,” in Solving the Schrodinger Equation: Has everything been tried? (to appear): Imperial College Press, 00 2011.

Fürlinger, K., and S. Moore, “Capturing and Analyzing the Execution Control Flow of OpenMP Applications,” International Journal of Parallel Programming, vol. 37, no. 3, pp. 266-276, 00 2009.

Schuchart, J., P. Samfass, C. Niethammer, J. Gracia, and G. Bosilca, “Callback-based completion notification using MPI Continuations,” Parallel Computing, vol. 21238566, issue 0225, pp. 102793, May Jan.

Deshmukh, S., R. Yokota, and G. Bosilca, “Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors,” ACM Transactions on Mathematical Software, vol. 49, issue 3, pp. 1 - 29, September 2023.

Fagg, G., and J. Dongarra, “Building and using a Fault Tolerant MPI implementation,” International Journal of High Performance Applications and Supercomputing (to appear), 00 2004.

Caron, E., Y. Caniou, A K W. Chang, and Y. Robert, “Budget-aware scheduling algorithms for scientific workflows with stochastic task weights on IaaS Cloud platforms,” Concurrency and Computation: Practice and Experience, vol. 33, no. 17, pp. e6065, 2021.

(1.99 MB)

Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, “A Block-Asynchronous Relaxation Method for Graphics Processing Units,” Journal of Parallel and Distributed Computing, vol. 73, issue 12, pp. 1613–1626, December 2013.

(1.08 MB)

Anzt, H., S. Tomov, M. Gates, J. Dongarra, and V. Heuveline, “Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,” ICCS 2012, Omaha, NE, June 2012.

(608.95 KB)

Anzt, H., S. Tomov, M. Gates, J. Dongarra, and V. Heuveline, Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems , no. UT-CS-11-689, December 2011.

(608.95 KB)

Danalis, A., P. Luszczek, G. Marin, J. Vetter, and J. Dongarra, “BlackjackBench: Portable Hardware Characterization with Automated Results Analysis,” The Computer Journal, March 2013.

(408.45 KB)

YarKhan, A., and J. Dongarra, “Biological Sequence Alignment on the Computational Grid Using the GrADS Framework,” Future Generation Computing Systems, vol. 21, no. 6: Elsevier, pp. 980-986, June 2005.

(147.29 KB)

Asch, M., T. Moore, R. M. Badia, M. Beck, P. Beckman, T. Bidot, F. Bodin, F. Cappello, A. Choudhary, B. R. de Supinski, et al., “Big Data and Extreme-Scale Computing: Pathways to Convergence - Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 435–479, July 2018.

(1.29 MB)

Dongarra, J., H. Meuer, H. D. Simon, and E. Strohmaier, “Biannual Top-500 Computer Lists Track Changing Environments for Scientific Computing,” SIAM News, vol. 34, no. 9, October 2002.

(2.62 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Batched One-Sided Factorizations of Tiny Matrices Using GPUs: Challenges and Countermeasures,” Journal of Computational Science, vol. 26, pp. 226–236, May 2018.

(3.73 MB)

Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, “Batched matrix computations on hardware accelerators based on GPUs,” International Journal of High Performance Computing Applications, February 2015.

(2.16 MB)

Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard , January 2001.

“Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard,” International Journal of High Performance Computing Applications: Special Issue - Part I & II, vol. 16, no. 1-2, pp. 1-199, January 2002.

Blackford, S., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, et al., “Basic Linear Algebra Subprograms (BLAS),” (an update), submitted to ACM TOMS, February 2001.

(228.33 KB)

Luszczek, P., J. Kurzak, I. Yamazaki, D. Keffer, V. Maroulas, and J. Dongarra, “Autotuning Techniques for Performance-Portable Point Set Registration in 3D,” Supercomputing Frontiers and Innovations, vol. 5, no. 4, December 2018.

(720.15 KB)

Dongarra, J., M. Gates, J. Kurzak, P. Luszczek, and Y. Tsai, “Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2040–2055, November 2018.

(2.53 MB)

Balaprakash, P., J. Dongarra, T. Gamblin, M. Hall, J. Hollingsworth, B. Norris, and R. Vuduc, “Autotuning in High-Performance Computing Applications,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2068–2083, November 2018.

(2.5 MB)

Kurzak, J., S. Tomov, and J. Dongarra, “Autotuning GEMM Kernels for the Fermi GPU,” IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012.

(742.5 KB)

You, H., B. Rekapalli, Q. Liu, and S. Moore, “Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,” TeraGrid'11, Salt Lake City, Utah, July 2011.

(275.34 KB)

Seymour, K., and J. Dongarra, “Automatic Translation of Fortran to JVM Bytecode,” Concurrency and Computation: Practice and Experience, vol. 15, no. 3-5, pp. 202-207, 00 2003.

(185.8 KB)

Wolf, F., and B. Mohr, “Automatic performance analysis of hybrid MPI/OpenMP applications,” Journal of Systems Architecture, Special Issue 'Evolutions in parallel distributed and network-based processing', vol. 49(10-11): Elsevier, pp. 421-439, November 2003.

Cuenca, J., D. Giminez, J. González, J. Dongarra, and K. Roche, “Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load,” EuroPar 2002, Paderborn, Germany, August 2002.

(92.59 KB)

Wolf, F., B. Mohr, J. Dongarra, and S. Moore, “Automatic analysis of inefficiency patterns in parallel applications,” Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted), 00 2005.

(233.31 KB)

Main menu

Pages