Publications

Hoemmen, M., D. Hollman, C. Trott, D. Sunderland, N. Liber, L-T. Lo, D. Lebrun-Grandie, G. Lopez, P. Caday, S. Knepper, et al., “P1673R3: A Free Function Linear algebra Interface Based on the BLAS,” ISO JTC1 SC22 WG22, no. P1673R3: ISO, April 2021.

(858.89 KB)

Weaver, V., D. Terpstra, H. McCraw, M. Johnson, K. Kasichayanula, J. Ralph, J. Nelson, P. Mucci, T. Mohan, and S. Moore, PAPI 5: Measuring Power, Energy, and the Cloud , Austin, TX, 2013 IEEE International Symposium on Performance Analysis of Systems and Software, April 2013.

(78.39 KB)

Browne, S., C. Deane, G. Ho, and P. Mucci, “PAPI: A Portable Interface to Hardware Performance Counters,” Proceedings of Department of Defense HPCMP Users Group Conference, June 1999.

(57.77 KB)

Danalis, A., H. Jagode, and J. Dongarra, PAPI: Counting outside the Box , Barcelona, Spain, 8th JLESC Meeting, April 2018.

London, K., S. Moore, P. Mucci, K. Seymour, and R. Luczak, “The PAPI Cross-Platform Interface to Hardware Performance Counters,” Department of Defense Users' Group Conference Proceedings, Biloxi, Mississippi, June 2001.

(328.56 KB)

Jagode, H., A. Danalis, H. Anzt, and J. Dongarra, “PAPI Software-Defined Events for in-Depth Performance Analysis,” The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1113-1127, November 2019.

(442.39 KB)

Danalis, A., H. Jagode, and J. Dongarra, PAPI's new Software-Defined Events for in-depth Performance Analysis , Dresden, Germany, 13th Parallel Tools Workshop, September 2019.

(3.14 MB)

Jagode, H., A. Danalis, and J. Dongarra, PAPI's New Software-Defined Events for In-Depth Performance Analysis , Lyon, France, CCDSC 2018: Workshop on Clusters, Clouds, and Data for Scientific Computing, September 2018.

Johnson, M., H. McCraw, S. Moore, P. Mucci, J. Nelson, D. Terpstra, V. M. Weaver, and T. Mohan, “PAPI-V: Performance Monitoring for Virtual Machines,” CloudTech-HPC 2012, Pittsburgh, PA, September 2012.

(2.69 MB)

Sid-Lakhdar, W. M., S. Cayrols, D. Bielich, A. Abdelfattah, P. Luszczek, M. Gates, S. Tomov, H. Johansen, D. Williams-Young, T. A. Davis, et al., “PAQR: Pivoting Avoiding QR factorization,” ICL Technical Report, no. ICL-UT-22-06, June 2022.

(364.85 KB)

Sid-Lakhdar, W., S. Cayrols, D. Bielich, A. Abdelfattah, P. Luszczek, M. Gates, S. Tomov, H. Johansen, D. Williams-Young, T. Davis, et al., “PAQR: Pivoting Avoiding QR factorization,” 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, FL, USA, IEEE, 2023.

Agullo, E., L. Giraud, A. Guermouche, A. Haidar, and J. Roman, “Parallel algebraic domain decomposition solver for the solution of augmented systems.,” Parallel, Distributed, Grid and Cloud Computing for Engineering, Ajaccio, Corsica, France, 12-15 April, 00 2011.

Petitet, A., H. Casanova, J. Dongarra, Y. Robert, and C. Whaley, “Parallel and Distributed Scientific Computing: A Numerical Linear Algebra Problem Solving Environment Designer's Perspective,” Handbook on Parallel and Distributed Processing, January 1999.

(323.01 KB)

Ltaeif, H., J. Kurzak, and J. Dongarra, “Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems (to appear), May 2009.

(208.16 KB)

Ltaeif, H., J. Kurzak, and J. Dongarra, “Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems, pp. 417-423, April 2010.

(208.16 KB)

Kurzak, J., M. Gates, A. YarKhan, I. Yamazaki, P. Wu, P. Luszczek, J. Finney, and J. Dongarra, “Parallel BLAS Performance Report,” SLATE Working Notes, no. 05, ICL-UT-18-01: University of Tennessee, April 2018.

(4.39 MB)

Ltaeif, H., J. Kurzak, and J. Dongarra, “Parallel Block Hessenberg Reduction using Algorithms-By-Tiles for Multicore Architectures Revisited,” University of Tennessee Computer Science Technical Report, UT-CS-08-624 (also LAPACK Working Note 208), August 2008.

(420.31 KB)

Buttari, A., J. Dongarra, J. Kurzak, and J. Langou, “Parallel Dense Linear Algebra Software in the Multicore Era,” in Cyberinfrastructure Technologies and Applications: Nova Science Publishers, Inc., pp. 9-24, 00 2009.

Henry, G., D. Watkins, and J. Dongarra, “A Parallel Implementation of the Nonsymmetric QR Algorithm for Disitributed Memory Architectures,” SIAM Journal on Scientific Computing, vol. 16, no. 2, pp. 284-311, October 2002.

(224.7 KB)

Henry, G., D. Watkins, and J. Dongarra, “A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures,” SIAM Journal on Scientific Computing, vol. 24, no. 1, pp. 284-311, January 2003.

(224.7 KB)

Cronk, D., G. Fagg, and S. Moore, “Parallel I/O for EQM Applications,” Department of Defense Users' Group Conference Proceedings (to appear),, Biloxi, Mississippi, June 2001.

(81.41 KB)

Fagg, G., E. Gabriel, and M. Resch, “Parallel IO Support for Meta-Computing Applications: MPI_Connect IO Applied to PACX-MPI,” 8th European PVM/MPI User's Group Meeting, Lecture Notes in Computer Science, vol. 2131, Greece, Springer Verlag, Berlin, September 2001.

(129.3 KB)

Kurzak, J., M. Gates, A. YarKhan, I. Yamazaki, P. Luszczek, J. Finney, and J. Dongarra, “Parallel Norms Performance Report,” SLATE Working Notes, no. 06, ICL-UT-18-06: Innovative Computing Laboratory, University of Tennessee, June 2018.

(1.13 MB)

Malony, A. D., S. Biersdorff, S. Shende, H. Jagode, S. Tomov, G. Juckeland, R. Dietrich, D. Poole, and C. Lamb, “Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs,” International Conference on Parallel Processing (ICPP'11), Taipei, Taiwan, ACM, September 2011.

(1.41 MB)

Wyrzykowski, R., E. Deelman, J. Dongarra, and K. Karczewski, “Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part II,” Lecture Notes in Computer Science, no. 12044: Springer International Publishing, pp. 503, March 2020.

Wyrzykowski, R., E. Deelman, J. Dongarra, and K. Karczewski, “Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I,” Lecture Notes in Computer Science, 1, no. 12043: Springer International Publishing, pp. 581, March 2020.

“Parallel Processing and Applied Mathematics, 9th International Conference, PPAM 2011,” Lecture Notes in Computer Science, vol. 7203, Torun, Poland, 00 2012.

Luszczek, P., “Parallel Programming in MATLAB,” The International Journal of High Performance Computing Applications, vol. 23, no. 3, pp. 277-283, July 2009.

(215.71 KB)

Abalenkovs, M., A. Abdelfattah, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, and A. YarKhan, “Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems,” Supercomputing Frontiers and Innovations, vol. 2, no. 4, October 2015.

(3.68 MB)

Haidar, A., H. Ltaeif, and J. Dongarra, “Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,” University of Tennessee Computer Science Technical Report, UT-CS-11-677, (also Lawn254), August 2011.

(636.01 KB)

Haidar, A., H. Ltaeif, and J. Dongarra, “Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,” Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Seattle, WA, November 2011.

(636.01 KB)

Jia, Y., G. Bosilca, P. Luszczek, and J. Dongarra, “Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance,” International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE-SC 2013, Denver, CO, November 2013.

(147.09 KB)

Ribizel, T., and H. Anzt, “Parallel Selection on GPUs,” Parallel Computing, vol. 91, March 2020, 2019.

(1.43 MB)

Giraud, L., J. Langou, and G.. Sylvand, “On the Parallel Solution of Large Industrial Wave Propagation Problems,” Journal of Computational Acoustics (to appear), January 2005.

(1.08 MB)

Wang, Y., M. Baboulin, J. Falcou, Y. Fraigneau, and O. Le Maître, “A Parallel Solver for Incompressible Fluid Flows,” International Conference on Computational Science (ICCS 2013), Barcelona, Spain, Elsevier B.V., June 2013.

(588.79 KB)

Ribizel, T., and H. Anzt, “Parallel Symbolic Cholesky Factorization,” SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.

Buttari, A., J. Langou, J. Kurzak, and J. Dongarra, “Parallel Tiled QR Factorization for Multicore Architectures,” Concurrency and Computation: Practice and Experience, vol. 20, pp. 1573-1590, January 2008.

(277.92 KB)

Buttari, A., J. Langou, J. Kurzak, and J. Dongarra, “Parallel Tiled QR Factorization for Multicore Architectures,” University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-598 (also LAPACK Working Note 190), 00 2007.

(277.92 KB)

Baboulin, M., D. Becker, and J. Dongarra, “A parallel tiled solver for dense symmetric indefinite systems on multicore architectures,” University of Tennessee Computer Science Technical Report, no. ICL-UT-11-07, October 2011.

(544.2 KB)

Baboulin, M., D. Becker, and J. Dongarra, “A Parallel Tiled Solver for Symmetric Indefinite Systems On Multicore Architectures,” IPDPS 2012, Shanghai, China, May 2012.

(544.09 KB)

Tisseur, F., and J. Dongarra, “Parallelizing the Divide and Conquer Algorithm for the Symmetric Tridiagonal Eigenvalue Problem on Distributed Memory Architectures,” SIAM Journal on Scientific Computing, vol. 6, no. 20, pp. 2223-2236, October 2002.

(321.36 KB)

Youseff, L., K. Seymour, H. You, D. Zagorodnov, J. Dongarra, and R. Wolski, “Paravirtualization Effect on Single- and Multi-threaded Memory-Intensive Linear Algebra Software,” Cluster Computing Journal: Special Issue on High Performance Distributed Computing, vol. 12, no. 2: Springer Netherlands, pp. 101-122, 00 2009.

(451.07 KB)

Anzt, H., E. Chow, and J. Dongarra, “ParILUT - A New Parallel Threshold ILU,” SIAM Journal on Scientific Computing, vol. 40, issue 4: SIAM, pp. C503–C519, July 2018.

(19.26 MB)

Anzt, H., T. Ribizel, G. Flegar, E. Chow, and J. Dongarra, “ParILUT – A Parallel Threshold ILU for GPUs,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.

(505.95 KB)

Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, T. Herault, and J. Dongarra, “PaRSEC: Exploiting Heterogeneity to Enhance Scalability,” IEEE Computing in Science and Engineering, vol. 15, issue 6, pp. 36-45, November 2013.

(2.16 MB)

Danalis, A., H. Jagode, G. Bosilca, and J. Dongarra, “PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution,” 2015 IEEE International Conference on Cluster Computing, Chicago, IL, IEEE, September 2015.

(1.77 MB)

Bhatia, N., S. Moore, F. Wolf, J. Dongarra, and B. Mohr, “A Pattern-Based Approach to Automated Application Performance Analysis,” Workshop on Patterns in High Performance Computing, University of Illinois at Urbana-Champaign, May 2005.

(3.47 MB)

Mucci, P., D. Ahlin, J. Danielsson, P. Ekman, and L. Malinowski, “PerfMiner: Cluster-Wide Collection, Storage and Presentation of Application Level Hardware Performance Data,” European Conference on Parallel Processing (Euro-Par 2005), Monte de Caparica, Portugal, Springer, September 2005.

(205.45 KB)

Haidar, A., B. Brock, S. Tomov, M. Guidry, J. Jay Billings, D. Shyles, and J. Dongarra, “Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations,” 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16), Waltham, MA, IEEE, September 2016.

(480.29 KB)

Main menu

Pages