Publications

Export 1016 results:
2018
Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale,” SIAM Review, vol. 60, issue 4, pp. 808–865, November 2018. DOI: 10.1137/17M1117732
Jagode, H., A. Danalis, H. Anzt, I. Yamazaki, M. Hoemmen, E. Boman, S. Tomov, and J. Dongarra, Software-Defined Events (SDEs) in MAGMA-Sparse,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-12: University of Tennessee, December 2018.  (481.69 KB)
Danalis, A., H. Jagode, and J. Dongarra, Software-Defined Events through PAPI for In-Depth Analysis of Application Performance , Basel, Switzerland, 5th Platform for Advanced Scientific Computing Conference (PASC18), July 2018.
Anzt, H., I. Yamazaki, M. Hoemmen, E. Boman, and J. Dongarra, Solver Interface & Performance on Cori,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-05: University of Tennessee, June 2018.  (188.05 KB)
Zaitsev, D., S. Tomov, and J. Dongarra, Solving Linear Diophantine Systems on Parallel Architectures,” IEEE Transactions on Parallel and Distributed Systems, October 2018. DOI: 10.1109/TPDS.2018.2873354
Bernholdt, D. E., S. Boehm, G. Bosilca, M. Grentla Venkata, R. E. Grant, T. Naughton, H. P. Pritchard, M. Schulz, and G. R. Vallee, A Survey of MPI Usage in the US Exascale Computing Project,” Concurrency Computation: Practice and Experience, September 2018. DOI: 10.1002/cpe.4851  (359.54 KB)
Yamazaki, I., J. Kurzak, P. Wu, M. Zounon, and J. Dongarra, Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 8, pp. 1879–1892, August 2018. DOI: 10.1109/TPDS.2018.2808964  (2.88 MB)
Bosilca, G., D. Genet, R. Harrison, T. Herault, M. Mahdi Javanmard, C. Peng, and E. Valeev, Tensor Contraction on Distributed Hybrid Architectures using a Task-Based Runtime System,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-13: University of Tennessee, December 2018.  (326.11 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Tensor Contractions using Optimized Batch GEMM Routines , San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.  (1.64 MB)
Haidar, A., S. Tomov, A. Abdelfattah, M. Zounon, and J. Dongarra, Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption , Frankfurt, Germany, ISC High Performance (ISC18), Best Poster Award, June 2018.  (3.01 MB)
Haidar, A., S. Tomov, A. Abdelfattah, M. Zounon, and J. Dongarra, Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption,” ISC High Performance (ISC'18), Best Poster, Frankfurt, Germany, June 2018.  (3.01 MB)
Chow, E., H. Anzt, J. Scott, and J. Dongarra, Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning,” Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, November 2018. DOI: 10.1016/j.jpdc.2018.04.017  (273.53 KB)
Anzt, H., J. Dongarra, G. Flegar, and T. Gruetzmacher, Variable-Size Batched Condition Number Calculation on GPUs,” SBAC-PAD, Lyon, France, September 2018.  (509.3 KB)
Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Ortí, Variable-Size Batched Gauss–Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,” Parallel Computing, January 2018. DOI: 10.1016/j.parco.2017.12.006  (1.9 MB)
2019
Anzt, H., J. Dongarra, G. Flegar, N. J. Higham, and E. S. Quintana-Orti, Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers,” Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, 2019. DOI: 10.1002/cpe.4460  (341.54 KB)
Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Parallel Computing, vol. 81, pp. 1–21, January 2019. DOI: 10.1016/j.parco.2018.10.003  (3.27 MB)
Ribizel, T., and H. Anzt, Approximate and Exact Selection on GPUs,” 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, 2019. DOI: 10.1109/IPDPSW.2019.00088  (440.71 KB)
Anzt, H., and G. Flegar, Are we Doing the Right Thing? — A Critical Analysis of the Academic HPC Community,” 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, 2019. DOI: 10.1109/IPDPSW.2019.00122  (622.32 KB)
Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, Checkpointing Strategies for Shared High-Performance Computing Platforms,” International Journal of Networking and Computing, vol. 9, no. 1, pp. 28–52, 2019.
Benoit, A., A. Cavelan, F. M. Ciorba, V. Le Fèvre, and Y. Robert, Combining checkpointing and replication for reliable execution of linear workflows with fail-stop and silent errors,” International Journal of Networking and Computing, vol. 9, no. 1, pp. 2-27, 2019.  (754.6 KB)
Le Fèvre, V., T. Herault, Y. Robert, A. Bouteiller, A. Hori, G. Bosilca, and J. Dongarra, Comparing the Performance of Rigid, Moldable, and Grid-Shaped Applications on Failure-Prone HPC Platforms,” Parallel Computing, vol. 85, pp. 1–12, July 2019. DOI: 10.1016/j.parco.2019.02.002  (865.18 KB)
Kaya, O., and Y. Robert, Computing dense tensor decompositions with optimal dimension trees,” Algorithmica, to appear, 2019.  (638.4 KB)
Aupy, G., A. Benoit, B. Goglin, L. Pottier, and Y. Robert, Co-scheduling HPC workloads on cache-partitioned CMP platforms,” Int. Journal of High Performance Computing Applications, To appear, 2019.  (930.28 KB)
Danalis, A., H. Jagode, H. Hanumantharayappa, S. Ragate, and J. Dongarra, Counter Inspection Toolkit: Making Sense out of Hardware Performance Event,” 11th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, Cham, Switzerland: Springer, September 2019.  (216.39 KB)
Gruetzmacher, T., T. Cojean, G. Flegar, F. Göbel, and H. Anzt, A Customized Precision Format Based on Mantissa Segmentation for Accelerating Sparse Linear Algebra,” Concurrency and Computation: Practice and Experience, vol. 40319, issue 262, January 2019. DOI: 10.1002/cpe.5418
Tomov, S., A. Haidar, A. Ayala, D. Schultz, and J. Dongarra, Design and Implementation for FFT-ECP on Distributed Accelerated Systems,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-05: University of Tennessee, April 2019.  (3.19 MB)
Yamazaki, I., A. Ida, R. Yokota, and J. Dongarra, Distributed-Memory Lattice H-Matrix Factorization,” The International Journal of High Performance Computing Applications, August 2019. DOI: 10.1177/1094342019861139  (1.14 MB)
Danalis, A., H. Jagode, and J. Dongarra, Does your tool support PAPI SDEs yet? , Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.  (3.09 MB)
M. Lopez, G., W. Joubert, V. Larrea, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, Evaluation of Directive-Based Performance Portable Programming Models,” International Journal of High Performance Computing and Networking (to appear), 2019. DOI: 10.1504/IJHPCN.2017.10009064
Abdelfattah, A., S. Tomov, and J. Dongarra, Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs,” 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.  (675.5 KB)
Tomov, S., A. Haidar, A. Ayala, D. Schultz, and J. Dongarra, FFT-ECP Fast Fourier Transform , Houston, TX, 2019 ECP Annual Meeting (Research Poster), January 2019.  (1.51 MB)
Han, L., V. Le Fèvre, L-C. Canon, Y. Robert, and F. Vivien, A Generic Approach to Scheduling and Checkpointing Workflows,” Int. Journal of High Performance Computing Applications, To appear, 2019.  (555.01 KB)
Shaiek, H., S. Tomov, A. Ayala, A. Haidar, and J. Dongarra, GPUDirect MPI Communications and Optimizations to Accelerate FFTs on Exascale Systems,” EuroMPI'19 Posters, Zurich, Switzerland, no. icl-ut-19-06: ICL, September 2019.  (2.25 MB)
Wong, K., S. Tomov, and J. Dongarra, Hands-on Research and Training in High-Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments,,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.  (1016.52 KB)
Han, L., L-C. Canon, J. Liu, Y. Robert, and F. Vivien, Improved energy-aware strategies for periodic real-time tasks under reliability constraints,” RTSS'2019, the 40th IEEE Real-Time Systems Symposium: IEEE Press, 2019.
Luszczek, P., I. Yamazaki, and J. Dongarra, Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators,” 2019 IEEE High Performance Extreme Computing Conference (HPEC ‘19), Waltham, MA, IEEE, September 2019.  (469.96 KB)
Beck, M., T. Moore, and P. Luszczek, Interoperable Convergence of Storage, Networking, and Computation,” Future of Information and Communication Conference (FICC), San Francisco, Science and Information (SAI), March 2019.  (1.8 MB)
Beck, M., T. Moore, P. Luszczek, and A. Danalis, Interoperable Convergence of Storage, Networking, and Computation,” FICC 2019, San Francisco, CA, Springer, March 14-15, 2019.  (2.64 MB)
Losada, N., G. Bosilca, A. Bouteiller, P. González, and M. J. Martín, Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging,” Future Generation Computer Systems, vol. 91, pp. 450-464, February 2019. DOI: 10.1016/j.future.2018.09.041  (1.16 MB)
Ng, L., S. Chen, A. Gessinger, D. Nichols, S. Cheng, A. Meenasorna, K. Wong, S. Tomov, A. Haidar, E. D'Azevedo, et al., MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs : University of Tennessee, January 2019. DOI: 10.13140/RG.2.2.14906.64961  (7.84 MB)
Nichols, D., K. Wong, S. Tomov, L. Ng, S. Chen, and A. Gessinger, MagmaDNN: Accelerated Deep Learning Using MAGMA,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.  (1.09 MB)
Nichols, D., N-S. Tomov, F. Betancourt, S. Tomov, K. Wong, and J. Dongarra, MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.  (1.37 MB)
Kurzak, J., Y. Tsai, M. Gates, A. Abdelfattah, and J. Dongarra, Massively Parallel Automated Software Tuning,” 48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan, ACM Press, August 2019.
Bai, Z., J. Dongarra, D. Lu, and I. Yamazaki, Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation,” International Parallel and Distributed Processing Symposium (IPDPS), May 2019.
Betancourt, F., K. Wong, E. Asemota, Q. Marshall, D. Nichols, and S. Tomov, OpenDIEL: A Parallel Workflow Engine and DataAnalytics Framework,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.  (1.48 MB)
Abdelfattah, A., S. Tomov, and J. Dongarra, Optimizing Batch HGEMM on Small Sizes Using Tensor Cores , San Jose, CA, GPU Technology Conference (GTC), March 2019.  (2.47 MB)
Jagode, H., A. Danalis, H. Anzt, and J. Dongarra, PAPI Software-Defined Events for in-Depth Performance Analysis,” The International Journal of High Performance Computing Applications, 2019.  (442.39 KB)
Danalis, A., H. Jagode, and J. Dongarra, PAPI's new Software-Defined Events for in-depth Performance Analysis , Dresden, Germany, 13th Parallel Tools Workshop, September 2019.  (3.14 MB)
Yamazaki, I., E. Chow, A. Bouteiller, and J. Dongarra, Performance of Asynchronous Optimized Schwarz with One-sided Communication,” Parallel Computing, vol. 86, pp. 66-81, August 2019. DOI: 10.1016/j.parco.2019.05.004
Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, P. Wu, I. Yamazaki, A. YarKhan, M. Abalenkovs, N. Bagherpour, et al., PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP,” ACM Transactions on Mathematical Software (to appear), 2019.  (7.5 MB)

Pages