Publications

Show only items where

Author

Type

Term

Year

Keyword

Export 1276 results:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Dong, T., A. Haidar, P. Luszczek, J. Harris, S. Tomov, and J. Dongarra, “LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU,” 16th IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.

(684.73 KB)

Kurzak, J., P. Luszczek, and J. Dongarra, “LU Factorization with Partial Pivoting for a Multicore System with Accelerators,” IEEE Transactions on Parallel and Distributed Computing, vol. 24, issue 8, pp. 1613-1621, August 2013.

(1.08 MB)

Haidar, A., S. Tomov, K. Arturov, M. Guney, S. Story, and J. Dongarra, “LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi,” IEEE High Performance Extreme Computing Conference (HPEC'16), Waltham, MA, IEEE, September 2016.

(943.23 KB)

Tomov, S., J. Dongarra, A. Haidar, I. Yamazaki, T. Dong, T. Schulthess, and R. Solcà, MAGMA: A Breakthrough in Solvers for Eigenvalue Problems , San Jose, CA, GPU Technology Conference (GTC12), Presentation, May 2012.

(9.23 MB)

Dongarra, J., T. Dong, M. Gates, A. Haidar, S. Tomov, and I. Yamazaki, MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.

(4.69 MB)

Dong, T., A. Haidar, P. Luszczek, S. Tomov, A. Abdelfattah, and J. Dongarra, “MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs,” Innovative Computing Laboratory Technical Report, no. ICL-UT-16-02: University of Tennessee, August 2016.

(929.79 KB)

Haidar, A., S. Tomov, P. Luszczek, and J. Dongarra, “MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing,” 2015 IEEE High Performance Extreme Computing Conference (HPEC ’15), (Best Paper Award), Waltham, MA, IEEE, September 2015.

(678.86 KB)

Tomov, S., MAGMA: Evolution and Revolution , Knoxville, TN, ICL Lunch Talk Seminar, July 2021.

(8.88 MB)

Tomov, S., MAGMA - LAPACK for GPUs , Atlanta, GA, Keeneland GPU Tutorial, April 2011.

(742.14 KB)

Tomov, S., and J. Dongarra, MAGMA - LAPACK for HPC on Heterogeneous Architectures , Oak Ridge, TN, Titan Summit at Oak Ridge National Laboratory, Presentation, August 2011.

(20.43 MB)

Dongarra, J., M. Gates, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov, MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.

(6.4 MB)

Anzt, H., J. Dongarra, M. Gates, A. Haidar, K. Kabir, P. Luszczek, S. Tomov, and I. Yamazaki, MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi , Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.

(2.03 MB)

Farhan, M. Al, A. Abdelfattah, S. Tomov, M. Gates, D. Sukkari, A. Haidar, R. Rosenberg, and J. Dongarra, “MAGMA Templates for Scalable Linear Algebra on Emerging Architectures,” The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020.

Tomov, S., and A. Haidar, MAGMA Tensors and Batched Computing for Accelerating Applications on GPUs , San Jose, CA, GPU Technology Conference (GTC17), Presentation in Session S7728, May 2017.

(11.12 MB)

Gates, M., MAGMA Tutorial , Atlanta, GA, Keeneland Workshop, February 2012.

(2.47 MB)

Ng, L., S. Chen, A. Gessinger, D. Nichols, S. Cheng, A. Meenasorna, K. Wong, S. Tomov, A. Haidar, E. D'Azevedo, et al., MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs : University of Tennessee, January 2019.

(7.84 MB)

Nichols, D., K. Wong, S. Tomov, L. Ng, S. Chen, and A. Gessinger, “MagmaDNN: Accelerated Deep Learning Using MAGMA,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.

(1.09 MB)

Ng, L., K. Wong, A. Haidar, S. Tomov, and J. Dongarra, MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs , Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.

(5.06 MB)

Nichols, D., N-S. Tomov, F. Betancourt, S. Tomov, K. Wong, and J. Dongarra, “MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.

(1.37 MB)

(8.72 MB)

Anzt, H., E. Boman, J. Dongarra, G. Flegar, M. Gates, M. Heroux, M. Hoemmen, J. Kurzak, P. Luszczek, S. Rajamanickam, et al., “MAGMA-sparse Interface Design Whitepaper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.

(1.28 MB)

Portillo, R., P. J. Teller, D. Cronk, and S. Moore, “Making Performance Analysis and Tuning Part of the Software Development Cycle,” Proceedings of DoD HPCMP UGC 2009, San Diego, CA, IEEE, June 2009.

Agullo, E., L. Giraud, A. Guermouche, A. Haidar, J. Roman, and Y. Lee-Tin-Yien, “MaPHyS or the Development of a Parallel Algebraic Domain Decomposition Solver in the Course of the Solstice Project,” Sparse Days 2010 Meeting at CERFACS, Toulouse, France, June 2010.

Strohmaier, E., J. Dongarra, H. Meuer, and H. D. Simon, “The Marketplace for High-Performance Computers,” Parallel Computing, vol. 25, no. 13-14, pp. 1517-1545, October 2002.

(285.78 KB)

Kurzak, J., Y. Tsai, M. Gates, A. Abdelfattah, and J. Dongarra, “Massively Parallel Automated Software Tuning,” 48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan, ACM Press, August 2019.

(911.88 KB)

Abdelfattah, A., J. Dongarra, A. Haidar, S. Tomov, and I. Yamazaki, MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Research Poster, November 2018.

(2.55 MB)

Tomov, S., MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020.

(2.28 MB)

Spannaus, A., K. J. H. Law, P. Luszczek, F. Nasrin, C. Putman Micucci, P. K. Liaw, L. J. Santodonato, D. J. Keffer, and V. Maroulas, “Materials fingerprinting classification,” Computer Physics Communications, pp. 108019, May Jan.

(3.8 MB)

Agullo, E., G. Bosilca, C. Castagnède, J. Dongarra, H. Ltaeif, and S. Tomov, “Matrices Over Runtime Systems at Exascale,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.

Tomov, S., Matrix Algebra on GPU and Multicore Architectures , Basel, Switzerland, Workshop on GPU-enabled Numerical Libraries, Presentation, May 2011.

(49.27 MB)

Abdelfattah, A., S. Tomov, and J. Dongarra, “Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions,” Journal of Parallel and Distributed Computing, vol. 145, pp. 188-201, November 2020.

(1.3 MB)

Bai, Z., J. Dongarra, D. Lu, and I. Yamazaki, “Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation,” International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.

(480.73 KB)

Dongarra, J., J-F. Pineau, Y. Robert, and F. Vivien, “Matrix Product on Heterogeneous Master Worker Platforms,” 2008 PPoPP Conference, Salt Lake City, Utah, January 2008.

Haidar, A., S. Tomov, A. Abdelfattah, I. Yamazaki, and J. Dongarra, MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) , Washington, DC, NSF PI Meeting, Poster, April 2018.

(2.4 MB)

Benoit, A., R. Elghazi, and Y. Robert, “Max-Stretch Minimization on an Edge-Cloud Platform,” IPDPS'2021, the 34th IEEE International Parallel and Distributed Processing Symposium: IEEE Computer Society Press, 2021.

(4.94 MB)

Dongarra, J., “Measuring Computer Performance: A Practioner's Guide,” SIAM Review (book review), vol. 43, no. 2, pp. 383-384, 00 2001.

(558.9 KB)

Weaver, V. M., M. Johnson, K. Kasichayanula, J. Ralph, P. Luszczek, D. Terpstra, and S. Moore, “Measuring Energy and Power with PAPI,” International Workshop on Power-Aware Systems and Architectures, Pittsburgh, PA, September 2012.

(146.79 KB)

Mucci, P., “Memory Bandwidth and the Performance of Scientific Applications: A Study of the AMD Opteron Processor,” 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (submitted), January 2004.

(210.29 KB)

Shende, S., A. D. Malony, S. Moore, and D. Cronk, “Memory Leak Detection in Fortran Applications using TAU,” Proc. DoD HPCMP Users Group Conference (HPCMP-UGC'07), Pittsburgh, PA, IEEE Computer Society, January 2007.

Barry, D., H. Jagode, A. Danalis, and J. Dongarra, Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements , St. Petersburg, FL, 28th HIPS Workshop, May 2023.

(3.99 MB)

Barry, D., H. Jagode, A. Danalis, and J. Dongarra, “Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements,” 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, Florida, IEEE, August 2023.

(1.81 MB)

Dongarra, J., G. Fagg, R. Hempel, and D. W. Walker, “Message Passing Software Systems,” Encyclopedia of Electrical and Engineering, Supplement 1: John Wiley & Sons, Inc., 00 2000.

(289.38 KB)

Cronk, D., B. Ellis, and G. Fagg, “Metacomputing: An Evaluation of Emerging Systems,” University of Tennessee Computer Science Department Technical Report, no. UT-CS-00-445, July 2000.

(280.21 KB)

Moore, S., D. Arnold, and D. Cronk, “Metacomputing Support for the SARA3D Structural Acoustics Application,” Department of Defense Users' Group Conference (to appear), Biloxi, Mississippi, June 2001.

(64.58 KB)

Vadhiyar, S., and J. Dongarra, “A Metascheduler For The Grid,” Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC 2002), Edinburgh, Scotland, IEEE Computer Society, pp. 343-351, July 2002.

(99.53 KB)

Marin, G., J. Dongarra, and D. Terpstra, “MIAMI: A Framework for Application Performance Diagnosis ,” IPASS-2014, Monterey, CA, IEEE, March 2014.

(1010.75 KB)

Beck, M., D. Arnold, A. Bassi, F. Berman, H. Casanova, J. Dongarra, T. Moore, G. Obertelli, J. Plank, M. Swany, et al., “Middleware for the Use of Storage in Communication,” Parallel Computing, vol. 28, no. 12, pp. 1773-1788, August 2002.

(87.97 KB)

Tsai, Y-H. Mike, N. Beams, and H. Anzt, “Mixed Precision Algebraic Multigrid on GPUs,” Parallel Processing and Applied Mathematics (PPAM 2022), vol. 13826, Cham, Springer International Publishing, April 2023.

Cayrols, S., J. Li, G. Bosilca, S. Tomov, A. Ayala, and J. Dongarra, “Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,” ICL Technical Report, no. ICL-UT-22-04, May 2022.

(706.14 KB)

Buttari, A., J. Dongarra, J. Langou, J. Langou, P. Luszczek, and J. Kurzak, “Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems,” International Journal of High Performance Computer Applications (to appear), August 2007.

(157.4 KB)

Lopez, F., and T. Mary, “Mixed Precision LU Factorization on GPU Tensor Cores: Reducing Data Movement and Memory Footprint,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-13: University of Tennessee, September 2020.

(409 KB)

Main menu

Publications

Pages