Papers

Export 785 results:
2016
Benoit, A., A. Bouteiller, Y. Robert, and H. Sun, "Assessing General-purpose Algorithms to Cope with Fail-stop and Silent Errors", ACM Transactions on Parallel Computing, to appear, 2016.  (573.71 KB)
Herrmann, J., G. Bosilca, T. Herault, L. Marchal, Y. Robert, and J. Dongarra, "Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results", Parallel Computing, vol. 52, pp. 22-41, February 2016.  (2.06 MB)
Baboulin, M., J. Dongarra, A. Remy, S. Tomov, and I. Yamazaki, "Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures", Lecture Notes in Computer Science, vol. 9573: Springer International Publishing, pp. 86-95, September 2015, 2016.  (327.14 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, "On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures", The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016.  (708.62 KB)
Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, "Efficiency of General Krylov Methods on GPUs – An Experimental Study", The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Chicago, IL, IEEE, May 2016.  (285.28 KB)
Benoit, A., S. K. Raina, and Y. Robert, "Efficient Checkpoint/Verification Patterns", International Journal on High Performance Computing Applications (IJHPCA), to appear, 2016.  (392.76 KB)
Wu, W., G. Bosilca, R. vandeVaart, S. Jeaugey, and J. Dongarra, "GPU-Aware Non-contiguous Data Movement In Open MPI", 25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, ACM, June 2016.  (482.32 KB)
Jia, Y., P. Luszczek, and J. Dongarra, "Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures", 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.  (535.72 KB)
Newburn, C. J., G. Bansal, M. Wood, L. Crivelli, J. Planas, A. Duran, P. Souza, L. Borges, P. Luszczek, S. Tomov, et al., "Heterogeneous Streaming", The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.  (2.73 MB)
Masliah, I., A. Ahmad, A. Haidar, S. Tomov, J. Falcou, and J. Dongarra, "High-performance Matrix-matrix Multiplications of Very Small Matrices", 22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16), Grenoble, France, Springer International Publishing, August 2016.
Ahmad, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., "High-Performance Tensor Contractions for GPUs", University of Tennessee Computer Science Technical Report, no. UT-EECS-16-738: University of Tennessee, January 2016.  (2.36 MB)
Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., "High-Performance Tensor Contractions for GPUs", International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.  (2.36 MB)
Dongarra, J., The HPL Benchmark: Past, Present & Future, , ISC High Performance, Frankfurt, Germany, July 2016.  (3.41 MB)
Haidar, A., S. Tomov, K. Arturov, M. Guney, S. Story, and J. Dongarra, "LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi", IEEE High Performance Extreme Computing Conference (HPEC'16), Waltham, MA, IEEE, September 2016.  (943.23 KB)
Dong, T., A. Haidar, P. Luszczek, S. Tomov, A. Abdelfattah, and J. Dongarra, "MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs", ICL Tech Report, 08/2016.  (929.79 KB)
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, "Optimal Resilience Patterns to Cope with Fail-stop and Silent Errors", IPDPS, Chicago, IL, IEEE, May 2016.  (603.58 KB)
Anzt, H., M. Kreutzer, E. Ponce, G. D. Peterson, G. Wellein, and J. Dongarra, "Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs", International Journal of High Performance Computing Applications, May 2016.  (2.08 MB)
Haidar, A., B. Brock, S. Tomov, M. Guidry, J. Jay Billings, D. Shyles, and J. Dongarra, "Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations", 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16), Waltham, MA, IEEE, September 2016.  (480.29 KB)
Haugen, B., "Performance Analysis and Modeling of Task-Based Runtimes", Department of Electrical Engineering and Computer Science, vol. PhD, Knoxville, University of Tennessee, May 2016.  (5.14 MB)
Ahmad, A., A. Haidar, S. Tomov, and J. Dongarra, "Performance, Design, and Autotuning of Batched GEMM for GPUs", University of Tennessee Computer Science Technical Report, no. UT-EECS-16-739: University of Tennessee, February 2016.  (1.27 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, "Performance, Design, and Autotuning of Batched GEMM for GPUs", The International Supercomputing Conference (ISC High Performance 2016), Frankfurt, Germany, June 2016.  (1.27 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, "Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs", International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.  (626.21 KB)
Dongarra, J., "Report on the Sunway TaihuLight System", University of Tennessee Computer Science Technical Report, no. UT-EECS-16-742: University of Tennessee, June 2016.
Aupy, G., A. Benoit, H. Casanova, and Y. Robert, "Scheduling Computational Workflows on Failure-prone Platforms", International Journal of Networking and Computing, vol. 6, no. 1, pp. 2-26, 2016.  (503.81 KB)
Luszczek, P., M. Gates, J. Kurzak, A. Danalis, and J. Dongarra, "Search Space Generation and Pruning System for Autotuners", 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.  (555.44 KB)
Yamazaki, I., S. Tomov, and J. Dongarra, "Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU", ACM Transactions on Mathematical Software (TOMS), 2016.
Anzt, H., E. Chow, J. Saak, and J. Dongarra, "Updating Incomplete Factorization Preconditioners for Model Order Reduction", Numerical Algorithms, 2016.  (1.12 MB)
2015
Gates, M., H. Anzt, J. Kurzak, and J. Dongarra, "Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs", 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015.  (1.02 MB)
Jagode, H., A. Danalis, G. Bosilca, and J. Dongarra, "Accelerating NWChem Coupled Cluster through dataflow-based Execution", 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Springer International Publishing, September 2015.  (452.82 KB)
Anzt, H., S. Tomov, and J. Dongarra, "Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product", Spring Simulation Multi-Conference 2015 (SpringSim'15), Alexandria, VA, SCS, April 2015.  (1.46 MB)
Anzt, H., W. Sawyer, S. Tomov, P. Luszczek, and J. Dongarra, "Acceleration of GPU-based Krylov solvers via Data Transfer Reduction", International Journal of High Performance Computing Applications, 2015.
Anzt, H., J. Dongarra, and E. S. Quintana-Ortí, "Adaptive Precision Solvers for Sparse Linear Systems", 3rd International Workshop on Energy Efficient Supercomputing (E2SC '15), Austin, TX, ACM, November 2015.
Bouteiller, A., T. Herault, G. Bosilca, P. Du, and J. Dongarra, "Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy", ACM Transactions on Parallel Computing, vol. 1, issue 2, no. 10, pp. 10:1-10:28, January 2015.  (1.14 MB)
Chow, E., H. Anzt, and J. Dongarra, "Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs", International Supercomputing Conference (ISC 2015), Frankfurt, Germany, July 2015.
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, "Batched Matrix Computations on Hardware Accelerators", EuroMPI/Asia 2015 Workshop, Bordeaux, France, September 2015.  (589.05 KB)
Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, "Batched Matrix Computations on Hardware Accelerators Based on GPUs", 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.  (9.36 MB)
Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, "Batched matrix computations on hardware accelerators based on GPUs", International Journal of High Performance Computing Applications, February 2015.  (2.16 MB)
YarKhan, A., A. Haidar, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, "Cholesky Across Accelerators", 17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Elizabeth, NJ, IEEE, August 2015.
Gates, M., S. Tomov, and A. Haidar, "Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra", 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.  (4.7 MB)
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, "Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing", International Journal of Networking and Computing, vol. 5, no. 1, pp. 2-15, January 2015.  (755.54 KB)
Yamazaki, I., S. Tomov, and J. Dongarra, "Computing Low-rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and its Application to Solving a Hierarchically Semiseparable Linear System of Equations", Scientific Programming, 2015.  (648.87 KB)
Haidar, A., J. Kurzak, G. Pichon, and M. Faverge, " A Data Flow Divide and Conquer Algorithm for Multicore Architecture", 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.  (535.44 KB)
Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, "On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors", ISC High Performance 2015, Frankfurt, Germany, July 2015.  (1.49 MB)
Cao, C., G. Bosilca, T. Herault, and J. Dongarra, "Design for a Soft Error Resilient Dynamic Task-based Runtime", 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.  (2.31 MB)
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, "Efficient Eigensolver Algorithms on Accelerator Based Architectures", 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.  (6.98 MB)
Solcà, R., A. Kozhevnikov, A. Haidar, S. Tomov, T. C. Schulthess, and J. Dongarra, "Efficient Implementation Of Quantum Materials Simulations On Distributed CPU-GPU Systems", The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.  (1.09 MB)
Anzt, H., S. Tomov, and J. Dongarra, "Energy Efficiency and Performance Frontiers for Sparse Computations on GPU Supercomputers", Sixth International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM '15), San Francisco, CA, ACM, February 2015.  (2.29 MB)
Reed, D., and J. Dongarra, " Exascale Computing and Big Data", Communications of the ACM, vol. 58, no. 7: ACM, pp. 56-68, July 2015.  (7.3 MB)
Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, "Experiences in autotuning matrix multiplication for energy minimization on GPUs", Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 5096-5113, December 2015.  (1.98 MB)
Dongarra, J., T. Herault, and Y. Robert, "Fault Tolerance Techniques for High-performance Computing", University of Tennessee Computer Science Technical Report (also LAWN 289), no. UT-EECS-15-734: University of Tennessee, May 2015.

Pages