Publications

Export 908 results:
Submitted
Haidar, A., H. Jagode, P. Vaccaro, S. Tomov, and J. Dongarra, Investigating Power Capping toward Energy-Efficient Scientific Applications,” Concurrency and Computation: Practice and Experience (CCPE): Special Issue on Power-Aware Computing 2017, Submitted.
Anzt, H., T. Ribizel, G. Flegar, E. Chow, and J. Dongarra, ParILUT - A Parallel Threshold ILU for GPU,” IPDPS, Submitted.  (505.33 KB)
2019
Kaya, O., and Y. Robert, Computing dense tensor decompositions with optimal dimension trees,” Algorithmica, to appear, 2019.
Beck, M., T. Moore, and P. Luszczek, Interoperable Convergence of Storage, Networking, and Computation,” Future of Information and Communication Conference (FICC), San Francisco, Science and Information (SAI), March 2019.  (1.8 MB)
Losada, N., G. Bosilca, A. Bouteiller, P. González, and M. J. Martín, Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging,” Future Generation Computer Systems, vol. 91, pp. 450-464, February 2019.  (1.16 MB)
2018
Jagode, H., A. Danalis, and J. Dongarra, Accelerating NWChem Coupled Cluster through dataflow-based Execution,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540--551, July 2018.  (1.68 MB)
Luo, X., W. Wu, G. Bosilca, T. Patinyasakdikul, L. Wang, and J. Dongarra, ADAPT: An Event-Based Adaptive Collective Communication Framework,” Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '18, Tempe, Arizona, ACM Press, June 2018.  (493.65 KB)
Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.  (3.74 MB)
Yamazaki, I., A. Abdelfattah, A. Ida, S. Ohshima, S. Tomov, R. Yokota, and J. Dongarra, Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU clusters,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, British Columbia, Canada, IEEE, May 2018.  (1.37 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Batched One-Sided Factorizations of Tiny Matrices Using GPUs: Challenges and Countermeasures,” Journal of Computational Science, vol. 26, pp. 226–236, May 2018.  (3.73 MB)
Marques, O., J. Demmel, and P. B. Vasconcelos, Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem,” LAPACK Working Note, no. LAWN 295, ICL-UT-18-02: University of Tennessee, April 2018.  (1.53 MB)
Asch, M., T. Moore, R. M. Badia, M. Beck, P. Beckman, T. Bidot, F. Bodin, F. Cappello, A. Choudhary, B. R. de Supinski, et al., Big Data and Extreme-Scale Computing: Pathways to Convergence - Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 435–479, July 2018.  (1.29 MB)
Caniou, Y., E. Caron, A. Kong Win Chang, and Y. Robert, Budget-aware scheduling algorithms for scientific workflows with stochastic task weights on heterogeneous IaaS cloud platforms,” 27th International Heterogeneity in Computing Workshop {HCW 2013}: IEEE Computer Society Press, 2018.  (1.31 MB)
Han, L., L-C. Canon, H. Casanova, Y. Robert, and F. Vivien, Checkpointing Workflows for Fail-Stop Errors,” IEEE Transactions on Computers, vol. PP, issue 99, 2018.
Casanova, H., J. Herrmann, and Y. Robert, Computing the expected makespan of task graphs in the presence of silent errors,” Parallel Computing, vol. 75, 2018.  (2.56 MB)
Benoit, A., A. Cavelan, F. Cappello, P. Raghavan, Y. Robert, and H. Sun, Coping with silent and fail-stop errors at scale by combining replication and checkpointing,” J. Parallel and Distributed Computing, vol. to appear, 2018.  (837 KB)
Aupy, G., A. Benoit, S. Dai, L. Pottier, P. Raghavan, Y. Robert, and M. Shantharam, Co-scheduling Amdhal applications on cache-partitioned systems,” Int. Journal of High Performance Computing Applications, vol. 32, no. 1, pp. 123-138, 2018.  (672.52 KB)
Aupy, G., A. Benoit, B. Goglin, L. Pottier, and Y. Robert, Co-scheduling HPC workloads on cache-partitioned CMP platforms,” Cluster'2018: IEEE Computer Society Press, 2018.  (423.75 KB)
Bouteiller, A., G. Bosilca, T. Herault, and J. Dongarra, Data Movement Interfaces to Support Dataflow Runtimes,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-03: University of Tennessee, May 2018.  (210.94 KB)
Le Fèvre, V., G. Bosilca, A. Bouteiller, T. Herault, A. Hori, Y. Robert, and J. Dongarra, Do moldable applications perform better on failure-prone HPC platforms?,” Resilience: 11th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids, jointly published with Euro-Par 2018: Springer Verlag, 2018.  (360.72 KB)
Tomov, S., A. Haidar, D. Schultz, and J. Dongarra, Evaluation and Design of FFT for Distributed Accelerated Systems,” ECP WBS 2.3.3.09 Milestone Report, no. FFT-ECP ST-MS-10-1216: Innovative Computing Laboratory, University of Tennessee, October 2018.  (7.53 MB)
Jagode, H., A. Danalis, R. Hoque, M. Faverge, and J. Dongarra, Evaluation of Dataflow Programming Models for Electronic Structure Theory,” Concurrency and Computation: Practice and Experience (CCPE): Special Issue on Parallel and Distributed Algorithms, vol. 2018, issue e4490, pp. 1--20, May 2018.  (1.69 MB)
Han, L., V. Le Fèvre, L-C. Canon, Y. Robert, and F. Vivien, A Generic Approach to Scheduling and Checkpointing Workflows,” ICPP'2018, the 47th Int. Conf. on Parallel Processing: IEEE Computer Society Press, 2018.  (737.11 KB)
Haidar, A., A. Abdelfattah, M. Zounon, S. Tomov, and J. Dongarra, A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 5, pp. 973–984, May 2018.  (832.92 KB)
Anzt, H., T. Gruetzmacher, E. Quintana-Orti, and F. Scheidegger, High-Performance GPU Implementation of PageRank with Reduced Precision based on Mantissa Segmentation,” 8th Workshop on Irregular Applications: Architectures and Algorithms, 2018.
Abdelfattah, A., M. Gates, J. Kurzak, P. Luszczek, and J. Dongarra, Implementation of the C++ API for Batch BLAS,” SLATE Working Notes, no. 7, ICL-UT-18-04: Innovative Computing Laboratory, University of Tennessee, June 2018.  (1.07 MB)
Anzt, H., T. Huckle, J. Bräckle, and J. Dongarra, Incomplete Sparse Approximate Inverses for Parallel Preconditioning,” Parallel Computing, vol. 71, pp. 1–22, January 2018.
YarKhan, A., G. Ragghianti, J. Dongarra, M. Cawkwell, D. Perez, and A. Voter, Initial Integration and Evaluation of SLATE Parallel BLAS in LATTE,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-07: Innovative Computing Laboratory, University of Tennessee, June 2018.  (366.6 KB)
Haidar, A., H. Jagode, P. Vaccaro, A. YarKhan, S. Tomov, and J. Dongarra, Investigating power capping toward energy-efficient scientific applications,” Concurrency Computatation: Practice and Experience, vol. 2018, issue e4485, pp. 1--14, April 2018.  (1.2 MB)
Anzt, H., and J. Dongarra, A Jaccard Weights Kernel Leveraging Independent Thread Scheduling on GPUs,” SBAC-PAD, 2018.  (237.68 KB)
Kurzak, J., M. Gates, I. Yamazaki, A. Charara, A. YarKhan, J. Finney, G. Ragghianti, P. Luszczek, and J. Dongarra, Linear Systems Performance Report,” SLATE Working Notes, no. 8, ICL-UT-18-08: Innovative Computing Laboratory, University of Tennessee, September 2018.  (1.64 MB)
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Multi-level checkpointing and silent error detection for linear workflows,” Journal of Computational Science, vol. 28, pp. 398-415, 2018.
Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms,” APDCM workshop (co-located with IPDPS), Vancouver, BC, Canada, IEEE, May 2018.  (899.3 KB)
Anzt, H., M. Kreutzer, E. Ponce, G. D. Peterson, G. Wellein, and J. Dongarra, Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs,” The International Journal of High Performance Computing Applications, vol. 32, no. 2, pp. 220–230, 2018.  (2.08 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization,” IEEE High Performance Extreme Computing Conference (HPEC’18), Waltham, MA, IEEE, September 2018.
Kurzak, J., M. Gates, A. YarKhan, I. Yamazaki, P. Wu, P. Luszczek, J. Finney, and J. Dongarra, Parallel BLAS Performance Report,” SLATE Working Notes, no. 5, ICL-UT-18-01: University of Tennessee, April 2018.  (4.39 MB)
Kurzak, J., M. Gates, A. YarKhan, I. Yamazaki, P. Luszczek, J. Finney, and J. Dongarra, Parallel Norms Performance Report,” SLATE Working Notes, no. 6, ICL-UT-18-06: Innovative Computing Laboratory, University of Tennessee, June 2018.  (1.13 MB)
Anzt, H., E. Chow, and J. Dongarra, ParILUT - A New Parallel Threshold ILU , Tokyo, Japan, SIAM Conference on Parallel Processing for Scientific Computing, March 2018.  (19.26 MB)
Anzt, H., E. Chow, and J. Dongarra, ParILUT–-A New Parallel Threshold ILU Factorization,” SIAM Journal on Scientific Computing, vol. 40, no. 4, pp. C503–C519, 2018.  (469.24 KB)
Benoit, A., S. Perarnau, L. Pottier, and Y. Robert, A performance model to execute workflows on high-bandwidth memory architectures,” ICPP'2018, the 47th Int. Conf. on Parallel Processing: IEEE Computer Society Press, 2018.  (868.44 KB)
Castain, R., J. Hursey, A. Bouteiller, and D. Solt, PMIx: Process Management for Exascale Environments,” Parallel Computing, vol. 79, pp. 9–29, January 2018.
Hoemmen, M., and I. Yamazaki, Production Implementations of Pipelined & Communication-Avoiding Iterative Linear Solvers , Tokyo, Japan, SIAM Conference on Parallel Processing for Scientific Computing, March 2018.  (2.34 MB)
Anzt, H., I. Yamazaki, M. Hoemmen, E. Boman, and J. Dongarra, Solver Interface & Performance on Cori,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-05: University of Tennessee, June 2018.  (188.05 KB)
Zaitsev, D., S. Tomov, and J. Dongarra, Solving Linear Diophantine Systems on Parallel Architectures,” IEEE Transactions on Parallel and Distributed Systems, October 2018.
Bernholdt, D. E., S. Boehm, G. Bosilca, M. Grentla Venkata, R. E. Grant, T. Naughton, H. P. Pritchard, M. Schulz, and G. R. Vallee, A Survey of MPI Usage in the US Exascale Computing Project,” Concurrency Computatation: Practice and Experience, September 2018.  (359.54 KB)
Yamazaki, I., J. Kurzak, P. Wu, M. Zounon, and J. Dongarra, Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems, 2018.  (2.88 MB)
Chow, E., H. Anzt, J. Scott, and J. Dongarra, Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning,” Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, 2018.  (273.53 KB)
Anzt, H., J. Dongarra, G. Flegar, and T. Gruetzmacher, Variable-size batched condition number calculation on GPUs,” SBAC-PAD, 2018.  (509.3 KB)
Anzt, H., J. Dongarra, G. Flegar, and E. S. Quintana-Ortí, Variable-size batched Gauss–Jordan elimination for block-Jacobi preconditioning on graphics processors,” Parallel Computing, 2018.  (1.9 MB)
2017
Jagode, H., A. Danalis, and J. Dongarra, Accelerating NWChem Coupled Cluster through Dataflow-Based Execution,” The International Journal of High Performance Computing Applications, pp. 1–13, January 2017.  (4.07 MB)

Pages