Publications

Show only items where

Author

Type

Term

Year

Keyword

Export 1273 results:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Losada, N., G. Bosilca, A. Bouteiller, P. González, and M. J. Martín, “Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging,” Future Generation Computer Systems, vol. 91, pp. 450-464, February 2019.

(1.16 MB)

Lopez, F., and T. Mary, “Mixed Precision LU Factorization on GPU Tensor Cores: Reducing Data Movement and Memory Footprint,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-13: University of Tennessee, September 2020.

(409 KB)

Lopez, F., E. Chow, S. Tomov, and J. Dongarra, “Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.

(188.51 KB)

Lopez, M. G., W. Joubert, V. Larrea, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, “Evaluation of Directive-Based Performance Portable Programming Models,” International Journal of High Performance Computing and Networking, vol. 14, issue 2, pp. 165-182.

(1.12 MB)

Lopez, M. G., V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, “Towards Achieving Performance Portability Using Directives for Accelerators,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016.

(567.02 KB)

Lopez, F., E. Chow, S. Tomov, and J. Dongarra, “Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,” Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020.

(188.51 KB)

London, K., S. Moore, P. Mucci, K. Seymour, and R. Luczak, “The PAPI Cross-Platform Interface to Hardware Performance Counters,” Department of Defense Users' Group Conference Proceedings, Biloxi, Mississippi, June 2001.

(328.56 KB)

London, K., J. Dongarra, S. Moore, P. Mucci, K. Seymour, and T.. Spencer, “End-user Tools for Application Performance Analysis, Using Hardware Counters,” International Conference on Parallel and Distributed Computing Systems, Dallas, TX, August 2001.

(306.54 KB)

Lively, C., X. Wu, V. Taylor, S. Moore, H-C. Chang, C-Y. Su, and K. Cameron, “Power-Aware Prediction Models of Hybrid (MPI/OpenMP) Scientific Applications,” International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.

(479.49 KB)

Lively, C., X. Wu, V. Taylor, S. Moore, H-C. Chang, and K. Cameron, “Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems,” International Journal of High Performance Computing Applications, vol. 25, no. 3, pp. 342-350, 00 2011.

(467.18 KB)

Lindquist, N., P. Luszczek, and J. Dongarra, “Accelerating Restarted GMRES with Mixed Precision Arithmetic,” IEEE Transactions on Parallel and Distributed Systems, June 2021.

(572.4 KB)

Lindquist, N., M. Gates, P. Luszczek, and J. Dongarra, “Threshold Pivoting for Dense LU Factorization,” ScalAH22: 13th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems , Dallas, Texas, IEEE, November 2022.

(721.77 KB)

Lindquist, N., P. Luszczek, and J. Dongarra, “Replacing Pivoting in Distributed Gaussian Elimination with Randomized Techniques,” 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), Atlanta, GA, IEEE, November 2020.

(184.6 KB)

Lindquist, N., P. Luszczek, and J. Dongarra, Generalizing Random Butterfly Transforms to Arbitrary Matrix Sizes : arXiv, December 2023.

Lindquist, N., P. Luszczek, and J. Dongarra, “Improving the Performance of the GMRES Method using Mixed-Precision Techniques,” Smoky Mountains Computational Sciences & Engineering Conference (SMC2020), August 2020.

(600.33 KB)

Lindquist, N., P. Luszczek, and J. Dongarra, “Using Additive Modifications in LU Factorization Instead of Pivoting,” 37th ACM International Conference on Supercomputing (ICS'23), Orlando, FL, ACM, June 2023.

(624.18 KB)

Li, Y., A. YarKhan, J. Dongarra, K. Seymour, and A. Hurault, “Enabling Workflows in GridSolve: Request Sequencing and Service Trading,” Journal of Supercomputing, vol. 64, issue 3, pp. 1133-1152, June 2013.

(821.29 KB)

Li, J., B. Nicolae, J. M. Wozniak, and G. Bosilca, “Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training,” 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), Denver, CO, IEEE, November 2019.

(696.89 KB)

Li, Y., J. Dongarra, K. Seymour, and A. YarKhan, “Request Sequencing: Enabling Workflow for Efficient Problem Solving in GridSolve,” International Conference on Grid and Cooperative Computing (GCC 2008) (submitted), Shenzhen, China, October 2008.

(1.64 MB)

Li, Y., J. Dongarra, and S. Tomov, “A Note on Auto-tuning GEMM for GPUs,” 9th International Conference on Computational Science (ICCS 2009), no. 5544-5545, Baton Rouge, LA, pp. 884-892, May 2009.

(236.02 KB)

Li, Y., and J. Dongarra, “Request Sequencing: Enabling Workflow for Efficient Parallel Problem Solving in GridSolve,” ICL Technical Report, no. ICL-UT-08-01, April 2008.

(1.64 MB)

Li, J., G. Bosilca, A. Bouteiller, and B. Nicolae, “Elastic deep learning through resilient collective operations,” SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.

Funk, Y., M. Götz, and H. Anzt, “Prediction of Optimal Solvers for Sparse Linear Systems Using Deep Learning,” 2022 SIAM Conference on Parallel Processing for Scientific Computing (PP), Philadelphia, PA, Society for Industrial and Applied Mathematics, pp. 14 - 24.

Lemariner, P., G. Bosilca, C. Coti, T. Herault, and J. Dongarra, “Constructing Resilient Communication Infrastructure for Runtime Environments,” ParCo 2009, Lyon France, September 2009.

Lee, DW., and J. Dongarra, “VisPerf: Monitoring Tool for Grid Computing,” Lecture Notes in Computer Science, vol. 2659: Springer Verlag, Heidelberg, pp. 233-243, 00 2003.

(835.09 KB)

Le Fèvre, V., T. Herault, Y. Robert, A. Bouteiller, A. Hori, G. Bosilca, and J. Dongarra, “Comparing the Performance of Rigid, Moldable, and Grid-Shaped Applications on Failure-Prone HPC Platforms,” Parallel Computing, vol. 85, pp. 1–12, July 2019.

(865.18 KB)

Le Fèvre, V., G. Bosilca, A. Bouteiller, T. Herault, A. Hori, Y. Robert, and J. Dongarra, “Do moldable applications perform better on failure-prone HPC platforms?,” 11th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids, Turin, Italy, Springer Verlag, August 2018.

(360.72 KB)

“,” 15th European PVM/MPI Users' Group Meeting, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, vol. 5205, Dublin Ireland, Springer Berlin, January 2008.

Langou, J., J. Langou, P. Luszczek, J. Kurzak, A. Buttari, and J. Dongarra, “Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy,” University of Tennessee Computer Science Tech Report, no. UT-CS-06-574, LAPACK Working Note #175, April 2006.

(221.39 KB)

Langou, J., Z. Chen, G. Bosilca, and J. Dongarra, “Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,” SIAM SISC (to appear), May 2007.

(241.36 KB)

Langou, J., and J. Dongarra, “The Problem with the Linpack Benchmark Matrix Generator,” International Journal of High Performance Computing Applications, vol. 23, no. 1, pp. 5-14, 00 2009.

(136.41 KB)

Langou, J., B. Hoffman, and B. King, “How LAPACK library enables Microsoft Visual Studio support with CMake and LAPACKE,” University of Tennessee Computer Science Technical Report (also LAWN 270), no. UT-CS-12-698, July 2012.

(501.53 KB)

Lacoste, X., M. Faverge, P. Ramet, S. Thibault, and G. Bosilca, “Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes,” 23rd International Heterogeneity in Computing Workshop, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(807.33 KB)

Kurzak, J., M. Gates, A. YarKhan, I. Yamazaki, P. Wu, P. Luszczek, J. Finney, and J. Dongarra, “Parallel BLAS Performance Report,” SLATE Working Notes, no. 05, ICL-UT-18-01: University of Tennessee, April 2018.

(4.39 MB)

Kurzak, J., and J. Dongarra, “QR Factorization for the CELL Processor,” Scientific Programming, vol. 17, no. 1-2, pp. 31-42, 00 2010.

(194.95 KB)

Kurzak, J., A. Buttari, P. Luszczek, and J. Dongarra, “The PlayStation 3 for High Performance Scientific Computing,” University of Tennessee Computer Science Technical Report, no. UT-CS-08-608, January 2008.

(2.45 MB)

Kurzak, J., Y. Tsai, M. Gates, A. Abdelfattah, and J. Dongarra, “Massively Parallel Automated Software Tuning,” 48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan, ACM Press, August 2019.

(911.88 KB)

Kurzak, J., H. Ltaeif, J. Dongarra, and R. M. Badia, “Scheduling Linear Algebra Operations on Multicore Processors,” Concurrency Practice and Experience (to appear), 00 2009.

(716.18 KB)

Kurzak, J., P. Luszczek, S. Tomov, and J. Dongarra, “Preliminary Results of Autotuning GEMM Kernels for the NVIDIA Kepler Architecture,” LAWN 267, 00 2012.

(1.14 MB)

Kurzak, J., and J. Dongarra, “Implementing Linear Algebra Routines on Multi-Core Processors with Pipelining and a Look Ahead,” University of Tennessee Computer Science Tech Report, UT-CS-06-581, LAPACK Working Note #178, January 2006.

(304.4 KB)

Kurzak, J., H. Ltaeif, J. Dongarra, and R. M. Badia, “Scheduling Linear Algebra Operations on Multicore Processors,” University of Tennessee Computer Science Department Technical Report, UT-CS-09-636 (Also LAPACK Working Note 213), 00 2009.

(716.18 KB)

Kurzak, J., P. Luszczek, A. YarKhan, M. Faverge, J. Langou, H. Bouwmeester, and J. Dongarra, “Multithreading in the PLASMA Library,” Multi and Many-Core Processing: Architecture, Programming, Algorithms, & Applications: Taylor & Francis, 00 2013.

(536.28 KB)

Kurzak, J., M. Gates, I. Yamazaki, A. Charara, A. YarKhan, J. Finney, G. Ragghianti, P. Luszczek, and J. Dongarra, “Linear Systems Performance Report,” SLATE Working Notes, no. 08, ICL-UT-18-08: Innovative Computing Laboratory, University of Tennessee, September 2018.

(1.64 MB)

Kurzak, J., and J. Dongarra, “QR Factorization for the CELL Processor,” University of Tennessee Computer Science Technical Report, UT-CS-08-616 (also LAPACK Working Note 201), May 2008.

(194.95 KB)

Kurzak, J., R. Nath, P. Du, and J. Dongarra, “An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs,” Applied Parallel and Scientific Computing, vol. 7133, pp. 248-257, 00 2012.

(623.5 KB)

Kurzak, J., H. Ltaeif, J. Dongarra, and R. M. Badia, “Dependency-Driven Scheduling of Dense Matrix Factorizations on Shared-Memory Systems,” PPAM 2009, Poland, September 2009.

Kurzak, J., P. Wu, M. Gates, I. Yamazaki, P. Luszczek, G. Ragghianti, and J. Dongarra, “Designing SLATE: Software for Linear Algebra Targeting Exascale,” SLATE Working Notes, no. 03, ICL-UT-17-06: Innovative Computing Laboratory, University of Tennessee, October 2017.

(2.8 MB)

Kurzak, J., S. Tomov, and J. Dongarra, “Autotuning GEMM Kernels for the Fermi GPU,” IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012.

(742.5 KB)

Kurzak, J., and J. Dongarra, “Fully Dynamic Scheduler for Numerical Computing on Multicore Processors,” University of Tennessee Computer Science Department Technical Report, UT-CS-09-643 (Also LAPACK Working Note 220), 00 2009.

(488.24 KB)

Kurzak, J., M. Gates, A. Charara, A. YarKhan, and J. Dongarra, “SLATE Working Note 12: Implementing Matrix Inversions,” SLATE Working Notes, no. 12, ICL-UT-19-04: Innovative Computing Laboratory, University of Tennessee, June 2019.

(1.95 MB)

Main menu

Publications

Pages