Publications

Buttari, A., J. Dongarra, P. Husbands, J. Kurzak, and K. Yelick, “Multithreading for synchronization tolerance in matrix factorization,” Journal of Physics: Conference Series, SciDAC 2007, vol. 78, no. 2007, January 2007.

(577.73 KB)

Bouteiller, A., T. Herault, and G. Bosilca, “A Multithreaded Communication Substrate for OpenSHMEM,” 8th International Conference on Partitioned Global Address Space Programming Models (PGAS), Eugene, OR, October 2014.

(261.66 KB)

Goebel, F., H. Anzt, T. Cojean, G. Flegar, and E. S. Quintana-Orti, “Multiprecision Block-Jacobi for Iterative Triangular Solves,” European Conference on Parallel Processing (Euro-Par 2020): Springer, August 2020.

Benoit, A., A. Cavelan, Y. Robert, and H. Sun, “Multi-Level Checkpointing and Silent Error Detection for Linear Workflows,” Journal of Computational Science, vol. 28, pp. 398–415, September 2018.

Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, “Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization,” Euro-Par 2013, Aachen, Germany, Springer, August 2013.

(431.84 KB)

Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, “Multi-criteria checkpointing strategies: optimizing response-time versus resource utilization,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-01, February 2013.

(497.64 KB)

Danalis, A., L. Pollock, M. Swany, and J. Cavazos, “MPI-aware Compiler Optimizations for Improving Communication-Computation Overlap,” Proceedings of the 23rd annual International Conference on Supercomputing (ICS '09), Yorktown Heights, NY, USA, ACM, pp. 316-325, June 2009.

(308.92 KB)

Snir, M., S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, MPI - The Complete Reference, Volume 1: The MPI Core , Second, Cambridge, MA, USA, MIT Press, pp. 426, August 1998.

Schuchart, J., and G. Bosilca, “MPI Continuations And How To Invoke Them,” Sustained Simulation Performance 2021, Cham, Springer International Publishing, pp. 67 - 83, February 2023.

Pjesivac–Grbovic, J., G. Bosilca, G. Fagg, T. Angskun, and J. Dongarra, “MPI Collective Algorithm Selection and Quadtree Encoding,” Parallel Computing (Special Edition: EuroPVM/MPI 2006): Elsevier, 00 2007.

(308.39 KB)

Pjesivac–Grbovic, J., G. Fagg, T. Angskun, G. Bosilca, and J. Dongarra, “MPI Collective Algorithm Selection and Quadtree Encoding,” Lecture Notes in Computer Science, vol. 4192, no. ICL-UT-06-13: Springer Berlin / Heidelberg, pp. 40-48, September 2006.

(308.39 KB)

Pjesivac–Grbovic, J., G. Fagg, T. Angskun, G. Bosilca, and J. Dongarra, “MPI Collective Algorithm Selection and Quadtree Encoding,” ICL Technical Report, no. ICL-UT-06-11, 00 2006.

(308.39 KB)

Sharp, D., M. Stoyanov, S. Tomov, and J. Dongarra, “A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms,” ICL Technical Report, no. ICL-UT-21-04: University of Tennessee, August 2021.

(493.17 KB)

Supinski, B. R. de, S. Alam, D. Bailey, L. Carrington, C. Daley, A. Dubey, T. Gamblin, D. Gunter, P. D. Hovland, H. Jagode, et al., “Modeling the Office of Science Ten Year Facilities Plan: The PERI Architecture Tiger Team,” SciDAC 2009, Journal of Physics: Conference Series, vol. 180(2009)012039, San Diego, California, IOP Publishing, July 2009.

(906.39 KB)

Song, F., S. Moore, and J. Dongarra, “Modeling of L2 Cache Behavior for Thread-Parallel Scientific Programs on Chip Multi-Processors,” University of Tennessee Computer Science Technical Report, no. UT-CS-06-583, January 2006.

(652.93 KB)

Dongarra, J., A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and A. YarKhan, “Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems,” Supercomputing Frontiers and Innovations, vol. 1, issue 1, 2014.

(1.86 MB)

Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, “Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers,” Journal of Parallel and Distributed Computing, vol. 85, pp. 32-46, November 2015.

(5.06 MB)

Du, P., P. Luszczek, S. Tomov, and J. Dongarra, “Mixed-Tool Performance Analysis on Hybrid Multicore Architectures,” First International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2010), San Diego, CA, September 2010.

(1.24 MB)

Haidar, A., H. Bayraktar, S. Tomov, J. Dongarra, and N. J. Higham, “Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-05: University of Tennessee, May 2020.

(1.03 MB)

Yamazaki, I., S. Tomov, T. Dong, and J. Dongarra, “Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs,” VECPAR 2014 (Best Paper), Eugene, OR, June 2014.

(438.54 KB)

Yamazaki, I., J. Barlow, S. Tomov, J. Kurzak, and J. Dongarra, “Mixed-precision orthogonalization process Performance on multicore CPUs with GPUs,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.

(301.01 KB)

Haidar, A., H. Bayraktar, S. Tomov, J. Dongarra, and N. J. Higham, “Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems,” Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020.

(2.24 MB)

Yamazaki, I., S. Tomov, and J. Dongarra, “Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPU with Multiple GPUs,” SIAM Journal on Scientific Computing, vol. 37, no. 3, pp. C203-C330, May 2015.

(374.8 KB)

Yamazaki, I., S. Tomov, J. Kurzak, J. Dongarra, and J. Barlow, “Mixed-precision Block Gram Schmidt Orthogonalization,” 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Austin, TX, ACM, November 2015.

(235.69 KB)

Tsai, Y. M., P. Luszczek, and J. Dongarra, “Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices,” ICL Technical Report, no. ICL-UT-21-05, August 2021.

(3.93 MB)

Lopez, F., and T. Mary, “Mixed Precision LU Factorization on GPU Tensor Cores: Reducing Data Movement and Memory Footprint,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-13: University of Tennessee, September 2020.

(409 KB)

Buttari, A., J. Dongarra, J. Langou, J. Langou, P. Luszczek, and J. Kurzak, “Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems,” International Journal of High Performance Computer Applications (to appear), August 2007.

(157.4 KB)

Cayrols, S., J. Li, G. Bosilca, S. Tomov, A. Ayala, and J. Dongarra, “Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,” ICL Technical Report, no. ICL-UT-22-04, May 2022.

(706.14 KB)

Tsai, Y-H. Mike, N. Beams, and H. Anzt, “Mixed Precision Algebraic Multigrid on GPUs,” Parallel Processing and Applied Mathematics (PPAM 2022), vol. 13826, Cham, Springer International Publishing, April 2023.

Beck, M., D. Arnold, A. Bassi, F. Berman, H. Casanova, J. Dongarra, T. Moore, G. Obertelli, J. Plank, M. Swany, et al., “Middleware for the Use of Storage in Communication,” Parallel Computing, vol. 28, no. 12, pp. 1773-1788, August 2002.

(87.97 KB)

Marin, G., J. Dongarra, and D. Terpstra, “MIAMI: A Framework for Application Performance Diagnosis ,” IPASS-2014, Monterey, CA, IEEE, March 2014.

(1010.75 KB)

Vadhiyar, S., and J. Dongarra, “A Metascheduler For The Grid,” Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC 2002), Edinburgh, Scotland, IEEE Computer Society, pp. 343-351, July 2002.

(99.53 KB)

Moore, S., D. Arnold, and D. Cronk, “Metacomputing Support for the SARA3D Structural Acoustics Application,” Department of Defense Users' Group Conference (to appear), Biloxi, Mississippi, June 2001.

(64.58 KB)

Cronk, D., B. Ellis, and G. Fagg, “Metacomputing: An Evaluation of Emerging Systems,” University of Tennessee Computer Science Department Technical Report, no. UT-CS-00-445, July 2000.

(280.21 KB)

Dongarra, J., G. Fagg, R. Hempel, and D. W. Walker, “Message Passing Software Systems,” Encyclopedia of Electrical and Engineering, Supplement 1: John Wiley & Sons, Inc., 00 2000.

(289.38 KB)

Barry, D., H. Jagode, A. Danalis, and J. Dongarra, Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements , St. Petersburg, FL, 28th HIPS Workshop, May 2023.

(3.99 MB)

Barry, D., H. Jagode, A. Danalis, and J. Dongarra, “Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements,” 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, Florida, IEEE, August 2023.

(1.81 MB)

Shende, S., A. D. Malony, S. Moore, and D. Cronk, “Memory Leak Detection in Fortran Applications using TAU,” Proc. DoD HPCMP Users Group Conference (HPCMP-UGC'07), Pittsburgh, PA, IEEE Computer Society, January 2007.

Mucci, P., “Memory Bandwidth and the Performance of Scientific Applications: A Study of the AMD Opteron Processor,” 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (submitted), January 2004.

(210.29 KB)

Weaver, V. M., M. Johnson, K. Kasichayanula, J. Ralph, P. Luszczek, D. Terpstra, and S. Moore, “Measuring Energy and Power with PAPI,” International Workshop on Power-Aware Systems and Architectures, Pittsburgh, PA, September 2012.

(146.79 KB)

Dongarra, J., “Measuring Computer Performance: A Practioner's Guide,” SIAM Review (book review), vol. 43, no. 2, pp. 383-384, 00 2001.

(558.9 KB)

Benoit, A., R. Elghazi, and Y. Robert, “Max-Stretch Minimization on an Edge-Cloud Platform,” IPDPS'2021, the 34th IEEE International Parallel and Distributed Processing Symposium: IEEE Computer Society Press, 2021.

(4.94 MB)

Haidar, A., S. Tomov, A. Abdelfattah, I. Yamazaki, and J. Dongarra, MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) , Washington, DC, NSF PI Meeting, Poster, April 2018.

(2.4 MB)

Dongarra, J., J-F. Pineau, Y. Robert, and F. Vivien, “Matrix Product on Heterogeneous Master Worker Platforms,” 2008 PPoPP Conference, Salt Lake City, Utah, January 2008.

Bai, Z., J. Dongarra, D. Lu, and I. Yamazaki, “Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation,” International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.

(480.73 KB)

Abdelfattah, A., S. Tomov, and J. Dongarra, “Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions,” Journal of Parallel and Distributed Computing, vol. 145, pp. 188-201, November 2020.

(1.3 MB)

Tomov, S., Matrix Algebra on GPU and Multicore Architectures , Basel, Switzerland, Workshop on GPU-enabled Numerical Libraries, Presentation, May 2011.

(49.27 MB)

Agullo, E., G. Bosilca, C. Castagnède, J. Dongarra, H. Ltaeif, and S. Tomov, “Matrices Over Runtime Systems at Exascale,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.

Spannaus, A., K. J. H. Law, P. Luszczek, F. Nasrin, C. Putman Micucci, P. K. Liaw, L. J. Santodonato, D. J. Keffer, and V. Maroulas, “Materials fingerprinting classification,” Computer Physics Communications, pp. 108019, May Jan.

(3.8 MB)

Abdelfattah, A., J. Dongarra, A. Haidar, S. Tomov, and I. Yamazaki, MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Research Poster, November 2018.

(2.55 MB)

Main menu

Pages