Export 942 results:
Filters: Author is Jack Dongarra [Clear All Filters]
FFT Benchmark Performance Experiments on Systems Targeting Exascale,” ICL Technical Report, no. ICL-UT-22-02, March 2022.“
FFT-ECP API and High-Performance Library Prototype for 2-D and 3-D FFTs on Large-Scale Heterogeneous Systems with GPUs,” ECP Milestone Report, no. FFT-ECP STML13-27: Innovative Computing Laboratory, University of Tennessee, January 2020.“
FFT-ECP Fast Fourier Transform , Houston, TX, 2019 ECP Annual Meeting (Research Poster), January 2019.
FFT-ECP Implementation Optimizations and Features Phase,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-12: University of Tennessee, October 2019.“
Fine-grained Bit-Flip Protection for Relaxation Methods,” Journal of Computational Science, November 2016. DOI: 10.1016/j.jocs.2016.11.013“
Finite-choice Algorithm Optimization in Conjugate Gradients (LAPACK Working Note 159),” University of Tennessee Computer Science Technical Report, UT-CS-03-502, January 2003.“
Flexible Batched Sparse Matrix Vector Product on GPUs , Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.
Flexible Batched Sparse Matrix-Vector Product on GPUs,” 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '17), Denver, CO, ACM Press, November 2017. DOI: http://dx.doi.org/10.1145/3148226.3148230“
Flexible collective communication tuning architecture applied to Open MPI,” 2006 Euro PVM/MPI (submitted), Bonn, Germany, January 2006.“
Flexible Data Redistribution in a Task-Based Runtime System,” IEEE International Conference on Cluster Computing (Cluster 2020), Kobe, Japan, IEEE, September 2020. DOI: 10.1109/CLUSTER49012.2020.00032“
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1432-1441, May 2011.“
Flexible Linear Algebra Development and Scheduling with Cholesky Factorization,” 17th IEEE International Conference on High Performance Computing and Communications, Newark, NJ, August 2015.“
Formulation of Requirements for New PAPI++ Software Package: Part I: Survey Results,” PAPI++ Working Notes, no. 1, ICL-UT-20-02: Innovative Computing Laboratory, University of Tennessee Knoxville, January 2020.“
Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,” ISC High Performance, Frankfurt, Germany, Springer, July 2015.“
A Framework for Out of Memory SVD Algorithms,” ISC High Performance 2017, pp. 158–178, June 2017. DOI: 10.1007/978-3-319-58667-0_9“
From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming,” Parallel Computing, vol. 38, no. 8, pp. 391-407, August 2012.“
From Serial Loops to Parallel Execution on Distributed Systems,” International European Conference on Parallel and Distributed Computing (Euro-Par '12), Rhodes, Greece, August 2012.“
FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World,” Lecture Notes in Computer Science: Proceedings of EuroPVM-MPI 2000, (Hungary: Springer Verlag, 2000), pp. V1908,346-353, January 2000.“
Fully Dynamic Scheduler for Numerical Computing on Multicore Processors,” University of Tennessee Computer Science Department Technical Report, UT-CS-09-643 (Also LAPACK Working Note 220), 00 2009.“
The Future of Computing: Software Libraries , Savannah, GA, DOD CREATE Developers' Review, Keynote Presentation, February 2012.
Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC,” ScalA'19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019.“
Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,” EuroPar 2012 (also LAWN 260), Rhodes Island, Greece, August 2012.“
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,” University of Tennessee Computer Science Technical Report UT-CS-11-690 (also Lawn 260), December 2011.“
GPU-accelerated Co-design of Induced Dimension Reduction: Algorithmic Fusion and Kernel Overlap,” 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, Austin, TX, ACM, November 2015.“
GPU-Aware Non-contiguous Data Movement In Open MPI,” 25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, ACM, June 2016. DOI: http://dx.doi.org/10.1145/2907294.2907317“
GPUDirect MPI Communications and Optimizations to Accelerate FFTs on Exascale Systems,” EuroMPI'19 Posters, Zurich, Switzerland, no. icl-ut-19-06: ICL, September 2019.“
The GrADS Project: Software Support for High-Level Grid Application Development,” Technical Report, February 2000.“
The GrADS Project: Software Support for High-Level Grid Application Development,” International Journal of High Performance Applications and Supercomputing, vol. 15, no. 4, pp. 327-344, January 2001.“
GrADSolve - A Grid-based RPC System for Remote Invocation of Parallel Software,” Journal of Parallel and Distributed Computing (submitted), March 2003.“
GrADSolve - RPC for High Performance Computing on the Grid,” Lecture Notes in Computer Science, Proceedings of the 9th International Euro-Par Conference, vol. 2790, Klagenfurt, Austria, Springer-Verlag, Berlin, pp. 394-403, January 2003. DOI: 10.1007/978-3-540-45209-6_58“
Grid Computing applied to the Boundary Element Method,” Proceedings of the First International Conference on Parallel, Distributed and Grid Computing for Engineering, vol. 27, no. :104203/9027, Stirlingshire, UK, Civil-Comp Press, 00 2009.“
Grid-Enabling Problem Solving Environments: A Case Study of SCIRUN and NetSolve,” Proceedings of the High Performance Computing Symposium (HPC 2001) in 2001 Advanced Simulation Technologies Conference, Seattle, Washington, Society for Modeling and Simulation International, April 2001.“
GridRPC: A Remote Procedure Call API for Grid Computing,” ICL Technical Report, no. ICL-UT-02-06, November 2002.“
GridSolve: The Evolution of Network Enabled Solver,” Grid-Based Problem Solving Environments: IFIP TC2/WG 2.5 Working Conference on Grid-Based Problem Solving Environments (Prescott, AZ, July 2006): Springer, pp. 215-226, 00 2007.“
A Guide for Achieving High Performance with Very Small Matrices on GPUs: A Case Study of Batched LU and Cholesky Factorizations,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 5, pp. 973–984, May 2018. DOI: 10.1109/TPDS.2017.2783929“
HAN: A Hierarchical AutotuNed Collective Communication Framework,” IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020.“
Hands-on Research and Training in High-Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.“
HARNESS: A Next Generation Distributed Virtual Machine,” International Journal on Future Generation Computer Systems, vol. 15, no. 5-6, pp. 571-582, January 1999.“
HARNESS and Fault Tolerant MPI,” Parallel Computing, vol. 27, no. 11, pp. 1479-1496, January 2001.“
HARNESS Fault Tolerant MPI Design, Usage and Performance Issues,” Future Generation Computer Systems, vol. 18, no. 8, pp. 1127-1142, January 2002.“
Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers,” The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, TX, IEEE, November 2018. DOI: 10.1109/SC.2018.00050“
Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers and Achieve 74 Gflops/Watt on Nvidia V100 , San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.
Harnessing the Computing Continuum for Programming Our World,” Fog Computing: Theory and Practice: John Wiley & Sons, Inc., 2020. DOI: 10.1002/9781119551713.ch7“
Hash Functions for Datatype Signatures in MPI,” Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, vol. 3666, Sorrento (Naples), Italy, Springer-Verlag Berlin, pp. 76-83, September 2005.“
heFFTe: Highly Efficient FFT for Exascale,” International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020. DOI: 10.1007/978-3-030-50371-0_19“
heFFTe: Highly Efficient FFT for Exascale (Poster) , Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020.
heFFTe: Highly Efficient FFT for Exascale (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
heFFTe: Highly Efficient FFT for Exascale (Poster) : NVIDIA GPU Technology Conference (GTC2020), October 2020.
Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures,” 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.“