Publications

Export 826 results:
2017
Jagode, H., A. Danalis, and J. Dongarra, Accelerating NWChem Coupled Cluster through dataflow-based execution”, The International Journal of High Performance Computing Applications, January 2017.  (4.07 MB)
Aupy, G., Y. Robert, and F. Vivien, Assuming failure independence: are we right to be wrong?”, 3rd International Workshop on Fault Tolerant Systems (FTS), Hawaii, US, IEEE, 09/2017.  (597.11 KB)
Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Ortí, Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs”, Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, New York, NY, USA, ACM, pp. 1–10, February 2017.  (552.62 KB)
Faverge, M., J. Langou, Y. Robert, and J. Dongarra, Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation”, IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, IEEE, 05/2017.  (328.15 KB)
Gates, M., P. Luszczek, J. Kurzak, J. Dongarra, K. Arturov, C. Cecka, and C. Freitag, C++ API for BLAS and LAPACK”, SLATE Working Notes, no. 2: Innovative Computing Laboratory, University of Tennessee, June, 2017.  (721.12 KB)
Han, L., L-C. Canon, H. Casanova, Y. Robert, and F. Vivien, Checkpointing Workflows for Fail-Stop Errors”, IEEE Cluster, Hawaii, US, IEEE, 09/2017.  (400.64 KB)
Aupy, G., A. Benoit, L. Pottier, P. Raghavan, Y. Robert, and M. Shantharam, Co-Scheduling Algorithms for Cache-Partitioned Systems”, 19th Workshop on Advances in Parallel and Distributed Computational Models, Orlando, FL, IEEE Computer Society Press, 05/2017.  (584.76 KB)
Zhao, Y., L. Wan, W. Wu, G. Bosilca, R. Vuduc, J. Ye, W. Tang, and Z. Xu, Efficient Communications in Training Large Scale Neural Networks”, ACM MultiMedia Workshop 2017, Mountain View, CA, ACM, 10/2017.  (1.41 MB)
M. Lopez, G., V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, Evaluation of Directive-based Performance Portable Programming Models”, International Journal of High Performance Computing and Networking (IJHPCN), vol. (In Press), 2017.
Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, A Failure Detector for HPC Platforms”, The International Journal of High Performance Computing Applications, 07/2017.  (1.04 MB)
Kabir, K., A. Haidar, S. Tomov, A. Bouteiller, and J. Dongarra, A Framework for Out of Memory SVD Algorithms”, ISC High Performance 2017, pp. pp. 158-178, June 2017.  (393.22 KB)
Benoit, A., F. Cappello, A. Cavelan, Y. Robert, and H. Sun, Identifying the Right Replication Level to Detect and Correct Silent Errors at Scale”, 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, ACM, 06/2017.  (865.68 KB)
Yamazaki, I., M. Hoemmen, P. Luszczek, and J. Dongarra, Improving performance of GMRES by reducing communication and pipelining global collectives”, Proceedings of The 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017), Orlando, FL, June 2017.
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Multi-Level Checkpointing and Silent Error Detection for Linear Workflows”, Journal of Computational Science, 04/2017.  (1.09 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Novel HPC Techniques to Batch Execution of Many Variable Size BLAS Computations on GPUs”, International Conference on Supercomputing (ICS '17), Chicago, Illinois, ACM, June 2017.
Benoit, A., A. Cavelan, V. Le Fèvre, and Y. Robert, Optimal Checkpointing Period with replicated execution on heterogeneous platforms”, 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, IEEE Computer Society Press, 06/2017.  (1.02 MB)
Dong, T., A. Haidar, S. Tomov, and J. Dongarra, Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices”, International Conference on Computational Science (ICCS 2017), Zurich, Switzerland, Procedia Computer Science, June 2017.  (364.95 KB)
Anzt, H., M. Gates, J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, Preconditioned Krylov solvers on GPUs”, Parallel Computing, May 2017.
Fang, A., A. Cavelan, Y. Robert, and A. Chien, Resilience for Stencil Computations with Latent Errors”, International Conference on Parallel Processing (ICPP), Bristol, UK, IEEE Computer Society Press, 08/2017.  (1.19 MB)
Benoit, A., L. Pottier, and Y. Robert, Resilient Co-Scheduling of Malleable Applications”, International Journal of High Performance Computing Applications (IJHPCA), 05/2017.  (1.62 MB)
Abdelfattah, A., H. Anzt, A. Bouteiller, A. Danalis, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, et al., Roadmap for the Development of a Linear Algebra Library for Exascale Computing: SLATE: Software for Linear Algebra Targeting Exascale”, SLATE Working Notes, no. 1: Innovative Computing Laboratory, University of Tennessee, June, 2017.  (2.44 MB)
Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, A. Haidar, I. Karlin, T. Kolev, I. Masliah, and S. Tomov, Small Tensor Operations on Advanced Architectures for High-order Applications”, University of Tennessee Computer Science Technical Report, no. UT-EECS-17-749: Innovative Computing Laboratory, University of Tennessee, April 2017.  (1.09 MB)
Baboulin, M., J. Dongarra, A. Remy, S. Tomov, and I. Yamazaki, Solving dense symmetric indefinite systems using GPUs”, Concurrency and Computation: Practice and Experience, vol. 29, issue 9, March 2017.  (1.94 MB)
Yamazaki, I., S. Nooshabadi, S. Tomov, and J. Dongarra, Structure-aware Linear Solver for Realtime Convex Optimization for Embedded Systems”, IEEE Embedded Systems Letters, vol. PP, issue 99, May 2017.  (339.11 KB)
Benoit, A., A. Cavelan, V. Le Fèvre, Y. Robert, and H. Sun, Towards Optimal Multi-Level Checkpointing”, IEEE Transactions on Computers, vol. 66, issue 7, pp. 1212–1226, 07/2017.  (1.39 MB)
Anzt, H., J. Dongarra, G. Flegar, E. S. Quintana-Ortí, and A. E. Thomas, Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning”, International Conference on Computational Science (ICCS 2017), vol. 108, Zurich, Switzerland, Procedia Computer Science, pp. 1783-1792, June 2017.
2016
Anzt, H., M. Baboulin, J. Dongarra, Y. Fournier, F. Hulsemann, A. Khabou, and Y. Wang, Accelerating the Conjugate Gradient Algorithm with GPU in CFD Simulations”, VECPAR, 2016.
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Assessing General-purpose Algorithms to Cope with Fail-stop and Silent Errors”, ACM Transactions on Parallel Computing, August 2016.  (573.71 KB)
Herrmann, J., G. Bosilca, T. Herault, L. Marchal, Y. Robert, and J. Dongarra, Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results”, Parallel Computing, vol. 52, pp. 22-41, February 2016.  (2.06 MB)
Anzt, H., E. Chow, T. Huckle, and J. Dongarra, Batched Generation of Incomplete Sparse Approximate Inverses on GPUs”, Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 49–56, November 2016.
Anzt, H., E. Chow, and J. Dongarra, On block-asynchronous execution on GPUs”, LAPACK Working Note, no. 291, November 2016.  (1.05 MB)
Bosilca, G., T. Herault, and J. Dongarra, Context Identifier Allocation in Open MPI”, University of Tennessee Computer Science Technical Report, no. ICL-UT-16-01: Innovative Computing Laboratory, University of Tennessee, January 2016.
Baboulin, M., J. Dongarra, A. Remy, S. Tomov, and I. Yamazaki, Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures”, Lecture Notes in Computer Science, vol. 9573: Springer International Publishing, pp. 86-95, September 2015, 2016.  (327.14 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures”, The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016.  (708.62 KB)
Anzt, H., E. Chow, D. Szyld, and J. Dongarra, Domain Overlap for Iterative Sparse Triangular Solves on GPUs”, Software for Exascale Computing - SPPEXA, vol. 113: Springer International Publishing, pp. 527–545, September 2016.
Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, Efficiency of General Krylov Methods on GPUs – An Experimental Study”, The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Chicago, IL, IEEE, May 2016.  (285.28 KB)
Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, Efficiency of General Krylov Methods on GPUs – An Experimental Study”, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683-691, May 2016.
Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, Failure Detection and Propagation in HPC Systems”, Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.
Anzt, H., , and E. S. Quintana-Ortí, Fine-grained Bit-Flip Protection for Relaxation Methods”, Journal of Computational Science, November 2016.
Wu, W., G. Bosilca, R. vandeVaart, S. Jeaugey, and J. Dongarra, GPU-Aware Non-contiguous Data Movement In Open MPI”, 25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, ACM, June 2016.  (482.32 KB)
Jia, Y., P. Luszczek, and J. Dongarra, Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures”, 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.  (535.72 KB)
Newburn, C. J., G. Bansal, M. Wood, L. Crivelli, J. Planas, A. Duran, P. Souza, L. Borges, P. Luszczek, S. Tomov, et al., Heterogeneous Streaming”, The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.  (2.73 MB)
Dongarra, J., M. A. Heroux, and P. Luszczek, High Performance Conjugate Gradient Benchmark: A new Metric for Ranking High Performance Computing Systems”, International Journal of High Performance Computing Applications, vol. 30, issue 1, pp. 3 - 10, February 2016.  (277.51 KB)
Yamazaki, I., S. Nooshabadi, S. Tomov, and J. Dongarra, High Performance Realtime Convex Solver for Embedded Systems”, University of Tennessee Computer Science Technical Report, no. UT-EECS-16-745, October 2016.  (225.43 KB)
Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, J. Falcou, and J. Dongarra, High-performance Matrix-matrix Multiplications of Very Small Matrices”, 22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16), Grenoble, France, Springer International Publishing, August 2016.
Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., High-Performance Tensor Contractions for GPUs”, International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.  (2.36 MB)
Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., High-Performance Tensor Contractions for GPUs”, University of Tennessee Computer Science Technical Report, no. UT-EECS-16-738: University of Tennessee, January 2016.  (2.36 MB)
Dongarra, J., The HPL Benchmark: Past, Present & Future, , ISC High Performance, Frankfurt, Germany, July 2016.  (3.41 MB)
Haidar, A., S. Tomov, K. Arturov, M. Guney, S. Story, and J. Dongarra, LU, QR, and Cholesky Factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi”, IEEE High Performance Extreme Computing Conference (HPEC'16), Waltham, MA, IEEE, September 2016.  (943.23 KB)
Yamazaki, I., S. Tomov, and J. Dongarra, Non-GPU-resident Dense Symmetric Indefinite Factorization”, Concurrency and Computation: Practice and Experience, November 2016.

Pages