Publications

Export 841 results:
Book Chapter
Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, Accelerating Numerical Dense Linear Algebra Calculations with GPUs”, Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014.  (1.06 MB)
Nath, R., S. Tomov, and J. Dongarra, Blas for GPUs”, Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.  (1.05 MB)
Tomov, S., and J. Dongarra, Dense Linear Algebra for Hybrid GPU-based Systems”, Scientific Computing with Multicore and Accelerators, Boca Raton, Florida, CRC Press, 2010.
Baboulin, M., J. Dongarra, A. Remy, S. Tomov, and I. Yamazaki, Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures”, Lecture Notes in Computer Science, vol. 9573: Springer International Publishing, pp. 86-95, September 2015, 2016.  (327.14 KB)
Dongarra, J., N. J. Higham, M. R. Dennis, P. Glendinning, P. A. Martin, F. Santosa, and J. Tanner, High-Performance Computing”, The Princeton Companion to Applied Mathematics, Princeton, New Jersey, Princeton University Press, pp. 839-842, 2015.
Dongarra, J., and P. Luszczek, HPC Challenge: Design, History, and Implementation Highlights”, Contemporary High Performance Computing: From Petascale Toward Exascale, Boca Raton, FL, Taylor and Francis, 2013.  (790.01 KB)
Vetter, J., R. Glassbrook, K. Schwan, S. Yalamanchili, M. Horton, A. Gavrilovska, M. Slawinska, J. Dongarra, J. Meredith, P. Roth, et al., Keeneland: Computational Science Using Heterogeneous GPU Computing”, Contemporary High Performance Computing: From Petascale Toward Exascale, Boca Raton, FL, Taylor and Francis, 2013.  (2.7 MB)
Bai, Z., J. Demmel, J. Dongarra, J. Langou, and J. Wang, LAPACK”, Handbook of Linear Algebra, Second, Boca Raton, FL, CRC Press, 2013.  (223.21 KB)
Hoefler, T., J. M. Squyres, G. Fagg, G. Bosilca, W. Rehm, and A. Lumsdaine, A New Approach to MPI Collective Communication Implementations”, Distributed and Parallel Systems: Springer US, pp. 45-54, 2007.  (140.2 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance, Design, and Autotuning of Batched GEMM for GPUs”, High Performance Computing: 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, no. 9697: Springer International Publishing, pp. 21–38, 2016.  (1.98 MB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, Scalable Dense Linear Algebra on Heterogeneous Hardware”, HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, 2013.  (760.32 KB)
Conference Paper
Gates, M., H. Anzt, J. Kurzak, and J. Dongarra, Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs”, 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015.  (1.02 MB)
Gates, M., A. Haidar, and J. Dongarra, Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem”, VECPAR 2014, Eugene, OR, June 2014.  (199.44 KB)
Jagode, H., A. Danalis, G. Bosilca, and J. Dongarra, Accelerating NWChem Coupled Cluster through dataflow-based Execution”, 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Springer International Publishing, September 2015.  (452.82 KB)
Anzt, H., S. Tomov, and J. Dongarra, Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product”, Spring Simulation Multi-Conference 2015 (SpringSim'15), Alexandria, VA, SCS, April 2015.  (1.46 MB)
Yamazaki, I., T. Mary, J. Kurzak, S. Tomov, and J. Dongarra, Access-averse Framework for Computing Low-rank Matrix Approximations”, First International Workshop on High Performance Big Graph Data Management, Analysis, and Mining, Washington, DC, October 2014.
Anzt, H., J. Dongarra, and E. S. Quintana-Ortí, Adaptive Precision Solvers for Sparse Linear Systems”, 3rd International Workshop on Energy Efficient Supercomputing (E2SC '15), Austin, TX, ACM, November 2015.
Genet, D., A. Guermouche, and G. Bosilca, Assembly Operations for Multicore Architectures using Task-Based Runtime Systems”, Euro-Par 2014, Porto, Portugal, Springer International Publishing, August 2014.  (481.52 KB)
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the Impact of ABFT and Checkpoint Composite Strategies”, 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.02 MB)
Aupy, G., Y. Robert, and F. Vivien, Assuming failure independence: are we right to be wrong?”, 3rd International Workshop on Fault Tolerant Systems (FTS), Hawaii, US, IEEE, September 2017.  (597.11 KB)
Chow, E., H. Anzt, and J. Dongarra, Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs”, International Supercomputing Conference (ISC 2015), Frankfurt, Germany, July 2015.
Whaley, C., and J. Dongarra, Automatically Tuned Linear Algebra Software”, 1998 ACM/IEEE conference on Supercomputing (SC '98), Orlando, FL, IEEE Computer Society, November 1998.
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Batched Matrix Computations on Hardware Accelerators”, EuroMPI/Asia 2015 Workshop, Bordeaux, France, September 2015.  (589.05 KB)
Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, Batched Matrix Computations on Hardware Accelerators Based on GPUs”, 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.  (9.36 MB)
Faverge, M., J. Langou, Y. Robert, and J. Dongarra, Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation”, IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, IEEE, May 2017.  (328.15 KB)
Han, L., L-C. Canon, H. Casanova, Y. Robert, and F. Vivien, Checkpointing Workflows for Fail-Stop Errors”, IEEE Cluster, Hawaii, US, IEEE, September 2017.  (400.64 KB)
YarKhan, A., A. Haidar, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, Cholesky Across Accelerators”, 17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Elizabeth, NJ, IEEE, August 2015.
Cao, C., J. Dongarra, P. Du, M. Gates, P. Luszczek, and S. Tomov, clMAGMA: High Performance Dense Linear Algebra with OpenCL ”, International Workshop on OpenCL, Bristol University, England, May 2014.  (460.91 KB)
Gates, M., S. Tomov, and A. Haidar, Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra”, 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.  (4.7 MB)
Baboulin, M., J. Dongarra, and R. Lacroix, Computing Least Squares Condition Numbers on Hybrid Multicore/GPU Systems”, International Interdisciplinary Conference on Applied Mathematics, Modeling and Computational Science (AMMCS), Waterloo, Ontario, CA, August 2014.  (130.18 KB)
Aupy, G., A. Benoit, L. Pottier, P. Raghavan, Y. Robert, and M. Shantharam, Co-Scheduling Algorithms for Cache-Partitioned Systems”, 19th Workshop on Advances in Parallel and Distributed Computational Models, Orlando, FL, IEEE Computer Society Press, May 2017.  (584.76 KB)
Haidar, A., J. Kurzak, G. Pichon, and M. Faverge, A Data Flow Divide and Conquer Algorithm for Multicore Architecture”, 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.  (535.44 KB)
Yamazaki, I., S. Tomov, and J. Dongarra, Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES”, 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, New Orleans, LA, November 2014.  (465.52 KB)
Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime”, Workshop on Large-Scale Parallel Processing, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (398.16 KB)
Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors”, ISC High Performance 2015, Frankfurt, Germany, July 2015.  (1.49 MB)
Cao, C., G. Bosilca, T. Herault, and J. Dongarra, Design for a Soft Error Resilient Dynamic Task-based Runtime”, 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.  (2.31 MB)
Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, Designing LU-QR Hybrid Solvers for Performance and Stability”, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (4.2 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures”, The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016.  (708.62 KB)
Marin, G., C. McCurdy, and J. Vetter, Diagnosis and Optimization of Application Prefetching Performance”, Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013.  (827.31 KB)
Yamazaki, I., S. Rajamanickam, E. G. Boman, M. Hoemmen, M. A. Heroux, and S. Tomov, Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster”, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, November 2014.
Donfack, S., S. Tomov, and J. Dongarra, Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs”, Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, May 2014.  (490.08 KB)
Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, Efficiency of General Krylov Methods on GPUs – An Experimental Study”, The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Chicago, IL, IEEE, May 2016.  (285.28 KB)
Zhao, Y., L. Wan, W. Wu, G. Bosilca, R. Vuduc, J. Ye, W. Tang, and Z. Xu, Efficient Communications in Training Large Scale Neural Networks”, ACM MultiMedia Workshop 2017, Mountain View, CA, ACM, October 2017.  (1.41 MB)
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Efficient Eigensolver Algorithms on Accelerator Based Architectures”, 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.  (6.98 MB)
Solcà, R., A. Kozhevnikov, A. Haidar, S. Tomov, T. C. Schulthess, and J. Dongarra, Efficient Implementation Of Quantum Materials Simulations On Distributed CPU-GPU Systems”, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.  (1.09 MB)
Turchenko, V., G. Bosilca, A. Bouteiller, and J. Dongarra, Efficient Parallelization of Batch Pattern Training Algorithm on Many-core and Cluster Architectures”, 7th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Berlin, Germany, September 2013.  (102.51 KB)
Anzt, H., S. Tomov, and J. Dongarra, Energy Efficiency and Performance Frontiers for Sparse Computations on GPU Supercomputers”, Sixth International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM '15), San Francisco, CA, ACM, February 2015.  (2.29 MB)
Dong, T., A. Haidar, S. Tomov, and J. Dongarra, A Fast Batched Cholesky Factorization on a GPU”, International Conference on Parallel Processing (ICPP-2014), Minneapolis, MN, September 2014.  (1.37 MB)
Haidar, A., A. YarKhan, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, Flexible Linear Algebra Development and Scheduling with Cholesky Factorization”, 17th IEEE International Conference on High Performance Computing and Communications, Newark, NJ, August 2015.  (494.31 KB)
Haidar, A., T. Dong, S. Tomov, P. Luszczek, and J. Dongarra, Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations”, ISC High Performance, Frankfurt, Germany, Springer, July 2015.  (778.26 KB)

Pages