Publications

Show only items where

Author

Type

Term

Year

Keyword

Export 1275 results:

2015

Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, “Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs,” Concurrency and Computation: Practice and Experience, vol. 27, issue 17, pp. 5096 - 5113, Oct 12, 2015.

(1.99 MB)

Anzt, H., B. Haugen, J. Kurzak, P. Luszczek, and J. Dongarra, “Experiences in autotuning matrix multiplication for energy minimization on GPUs,” Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 5096-5113, December 2015.

(1.98 MB)

Dongarra, J., T. Herault, and Y. Robert, “Fault Tolerance Techniques for High-performance Computing,” University of Tennessee Computer Science Technical Report (also LAWN 289), no. UT-EECS-15-734: University of Tennessee, May 2015.

Haidar, A., A. YarKhan, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, “Flexible Linear Algebra Development and Scheduling with Cholesky Factorization,” 17th IEEE International Conference on High Performance Computing and Communications, Newark, NJ, August 2015.

(494.31 KB)

Haidar, A., T. Dong, S. Tomov, P. Luszczek, and J. Dongarra, “Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,” ISC High Performance, Frankfurt, Germany, Springer, July 2015.

(778.26 KB)

Tang, C., A. Bouteiller, T. Herault, M G. Venkata, and G. Bosilca, “From MPI to OpenSHMEM: Porting LAMMPS,” OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, Annapolis, MD, USA, Springer International Publishing, pp. 121–137, 2015.

Anzt, H., E. Ponce, G. D. Peterson, and J. Dongarra, “GPU-accelerated Co-design of Induced Dimension Reduction: Algorithmic Fusion and Kernel Overlap,” 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, Austin, TX, ACM, November 2015.

(1.46 MB)

Wu, W., A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, “Hierarchical DAG scheduling for Hybrid Distributed Systems,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.

(1.11 MB)

Dongarra, J., N. J. Higham, M. R. Dennis, P. Glendinning, P. A. Martin, F. Santosa, and J. Tanner, “High-Performance Computing,” The Princeton Companion to Applied Mathematics, Princeton, New Jersey, Princeton University Press, pp. 839-842, 2015.

Dongarra, J., M. A. Heroux, and P. Luszczek, “High-Performance Conjugate-Gradient Benchmark: A New Metric for Ranking High-Performance Computing Systems,” The International Journal of High Performance Computing Applications, 2015.

(336.19 KB)

Haidar, A., J. Dongarra, K. Kabir, M. Gates, P. Luszczek, S. Tomov, and Y. Jia, “HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi,” Scientific Programming, vol. 23, issue 1, January 2015.

(553.94 KB)

Dongarra, J., M. A. Heroux, and P. Luszczek, “HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems,” University of Tennessee Computer Science Technical Report , no. ut-eecs-15-736: University of Tennessee, January 2015.

Kurzak, J., H. Anzt, M. Gates, and J. Dongarra, “Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs,” IEEE Transactions on Parallel and Distributed Systems, no. 1045-9219, November 2015.

Anzt, H., E. Chow, and J. Dongarra, “Iterative Sparse Triangular Solves for Preconditioning,” EuroPar 2015, Vienna, Austria, Springer Berlin, August 2015.

(322.36 KB)

Tomov, S., Linear Algebra Software for High-Performance Computing (Part 2: Software for Hardware Accelerators and Coprocessors) , Frankfurt, Germany, ISC High Performance (ISC18), Tutorial Presentation, June 2015.

(15.41 MB)

Haidar, A., S. Tomov, P. Luszczek, and J. Dongarra, “MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing,” 2015 IEEE High Performance Extreme Computing Conference (HPEC ’15), (Best Paper Award), Waltham, MA, IEEE, September 2015.

(678.86 KB)

Anzt, H., J. Dongarra, M. Gates, A. Haidar, K. Kabir, P. Luszczek, S. Tomov, and I. Yamazaki, MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi , Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.

(2.03 MB)

Yamazaki, I., S. Tomov, J. Kurzak, J. Dongarra, and J. Barlow, “Mixed-precision Block Gram Schmidt Orthogonalization,” 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Austin, TX, ACM, November 2015.

(235.69 KB)

Yamazaki, I., S. Tomov, and J. Dongarra, “Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPU with Multiple GPUs,” SIAM Journal on Scientific Computing, vol. 37, no. 3, pp. C203-C330, May 2015.

(374.8 KB)

Yamazaki, I., J. Barlow, S. Tomov, J. Kurzak, and J. Dongarra, “Mixed-precision orthogonalization process Performance on multicore CPUs with GPUs,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.

(301.01 KB)

Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, “Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers,” Journal of Parallel and Distributed Computing, vol. 85, pp. 32-46, November 2015.

(5.06 MB)

Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, “Optimization for Performance and Energy for Batched Matrix Computations on GPUs,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8), San Francisco, CA, ACM, February 2015.

(699.5 KB)

Abalenkovs, M., A. Abdelfattah, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, and A. YarKhan, “Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems,” Supercomputing Frontiers and Innovations, vol. 2, no. 4, October 2015.

(3.68 MB)

Danalis, A., H. Jagode, G. Bosilca, and J. Dongarra, “PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution,” 2015 IEEE International Conference on Cluster Computing, Chicago, IL, IEEE, September 2015.

(1.77 MB)

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures,” The Spring Simulation Multi-Conference 2015 (SpringSim'15), Best Paper Award, Alexandria, VA, April 2015.

(608.44 KB)

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “Performance Analysis and Optimization of Two-Sided Factorization Algorithms for Heterogeneous Platform,” International Conference on Computational Science (ICCS 2015), Reykjavík, Iceland, June 2015.

(1.12 MB)

Mary, T., I. Yamazaki, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, “Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

Bouteiller, A., G. Bosilca, and J. Dongarra, “Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery,” 22nd European MPI Users' Group Meeting, Bordeaux, France, ACM, September 2015.

(543.32 KB)

Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, “Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof,” Innovative Computing Laboratory Technical Report, no. ICL-UT-15-01, April 2015.

(570.97 KB)

Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, “Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

(550.96 KB)

Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, “Randomized Algorithms to Update Partial Singular Value Decomposition on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

Anzt, H., E. Chow, D. Szyld, and J. Dongarra, “Random-Order Alternating Schwarz for Sparse Triangular Solves,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.

(1.53 MB)

Song, F., and J. Dongarra, “A Scalable Approach to Solving Dense Linear Algebra Problems on Hybrid CPU-GPU Systems,” Concurrency and Computation: Practice and Experience, vol. 27, issue 14, pp. 3702-3723, September 2015.

(8.16 MB)

Aupy, G., and Y. Robert, “Scheduling for fault-tolerance: an introduction,” Innovative Computing Laboratory Technical Report, no. ICL-UT-15-02: University of Tennessee, January 2015.

(416.37 KB)

Donfack, S., J. Dongarra, M. Faverge, M. Gates, J. Kurzak, P. Luszczek, and I. Yamazaki, “A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination,” Concurrency and Computation: Practice and Experience, vol. 27, issue 5, pp. 1292-1309, April 2015.

(783.45 KB)

Strohmaier, E., H. Meuer, J. Dongarra, and H. D. Simon, “The TOP500 List and Progress in High-Performance Computing,” IEEE Computer, vol. 48, issue 11, pp. 42-49, November 2015.

Baboulin, M., V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, and S. Tomov, Towards a High-Performance Tensor Algebra Package for Accelerators , Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC15), September 2015.

(1.76 MB)

Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, “Towards Batched Linear Solvers on Accelerated Hardware Platforms,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8) co-located with PPOPP 2015, San Francisco, CA, ACM, February 2015.

(403.74 KB)

Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Tuning Stationary Iterative Solvers for Fault Resilience,” 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA15), Austin, TX, ACM, November 2015.

(1.28 MB)

Shamis, P.., M G. Venkata, M. G. Lopez, M.. B. Baker, O.. Hernandez, Y.. Itigin, M.. Dubman, G.. Shainer, R.. L. Graham, L.. Liss, et al., “UCX: An Open Source Framework for HPC Network APIs and Beyond,” 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, Santa Clara, CA, USA, IEEE, pp. 40-43, 2015.

Haugen, B., S. Richmond, J. Kurzak, C. A. Steed, and J. Dongarra, “Visualizing Execution Traces with Task Dependencies,” 2nd Workshop on Visual Performance Analysis (VPA '15), Austin, TX, ACM, November 2015.

(927.5 KB)

Haidar, A., Y. Jia, P. Luszczek, S. Tomov, A. YarKhan, and J. Dongarra, “Weighted Dynamic Scheduling with Many Parallelism Grains for Offloading of Numerical Workloads to Multiple Varied Accelerators,” Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA'15), vol. No. 5, Austin, TX, ACM, November 2015.

(347.6 KB)

2016

Dongarra, J., J. Demmel, J. Langou, and J. Langou, “2016 Dense Linear Algebra Software Packages Survey,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-744 / LAWN 290: University of Tennessee, September 2016.

(366.43 KB)

Haidar, A., A. Abdelfattah, V. Dobrev, I. Karlin, T. Kolev, S. Tomov, and J. Dongarra, Accelerating Tensor Contractions for High-Order FEM on CPUs, GPUs, and KNLs , Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC16), Poster, September 2016.

(4.29 MB)

Anzt, H., M. Baboulin, J. Dongarra, Y. Fournier, F. Hulsemann, A. Khabou, and Y. Wang, “Accelerating the Conjugate Gradient Algorithm with GPU in CFD Simulations,” VECPAR, 2016.

Benoit, A., A. Cavelan, Y. Robert, and H. Sun, “Assessing General-purpose Algorithms to Cope with Fail-stop and Silent Errors,” ACM Transactions on Parallel Computing, August 2016.

(573.71 KB)

Herrmann, J., G. Bosilca, T. Herault, L. Marchal, Y. Robert, and J. Dongarra, “Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results,” Parallel Computing, vol. 52, pp. 22-41, February 2016.

(2.06 MB)

Anzt, H., E. Chow, T. Huckle, and J. Dongarra, “Batched Generation of Incomplete Sparse Approximate Inverses on GPUs,” Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 49–56, November 2016.

Anzt, H., E. Chow, and J. Dongarra, “On block-asynchronous execution on GPUs,” LAPACK Working Note, no. 291, November 2016.

(1.05 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Cholesky Factorization on Batches of Matrices with Fixed and Variable Sizes , San Jose, CA, GPU Technology Conference (GTC16), Poster, April 2016.

(480.51 KB)

Main menu

Publications

Pages