Publications

Export 839 results:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
, The Future of Supercomputing: An Interim Report”, National Research Council, Washington, D.C., The National Academies Press, January 2003.
A
Abalenkovs, M., A. Abdelfattah, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, and A. YarKhan, Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems”, Supercomputing Frontiers and Innovations, vol. 2, no. 4, October 2015.  (3.68 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures”, Procedia Computer Science, vol. 108, pp. 606–615, June 2017.
Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., High-Performance Tensor Contractions for GPUs”, International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.  (2.36 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance, Design, and Autotuning of Batched GEMM for GPUs”, High Performance Computing: 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, no. 9697: Springer International Publishing, pp. 21–38, 2016.  (1.98 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Fast Cholesky Factorization on GPUs for Batch and Native Modes in MAGMA”, Journal of Computational Science, vol. 20, pp. 85–93, May 2017.
Abdelfattah, A., H. Ltaeif, D. Keyes, and J. Dongarra, Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs”, Concurrency and Computation: Practice and Experience, vol. 28, issue 12, pp. 3447 - 3465, May 2016.  (3.21 MB)
Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, A. Haidar, I. Karlin, T. Kolev, I. Masliah, and S. Tomov, Small Tensor Operations on Advanced Architectures for High-order Applications”, University of Tennessee Computer Science Technical Report, no. UT-EECS-17-749: Innovative Computing Laboratory, University of Tennessee, April 2017.  (1.09 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures”, The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016.  (708.62 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance, Design, and Autotuning of Batched GEMM for GPUs”, University of Tennessee Computer Science Technical Report, no. UT-EECS-16-739: University of Tennessee, February 2016.  (1.27 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance, Design, and Autotuning of Batched GEMM for GPUs”, The International Supercomputing Conference (ISC High Performance 2016), Frankfurt, Germany, June 2016.  (1.27 MB)
Abdelfattah, A., M. Baboulin, V. Dobrev, J. Dongarra, C. Earl, J. Falcou, A. Haidar, I. Karlin, T. Kolev, I. Masliah, et al., High-Performance Tensor Contractions for GPUs”, University of Tennessee Computer Science Technical Report, no. UT-EECS-16-738: University of Tennessee, January 2016.  (2.36 MB)
Abdelfattah, A., J. Dongarra, D. Keyes, and H. Ltaeif, Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators”, VECPAR 2012, Kobe, Japan, July 2012.  (737.28 KB)
Abdelfattah, A., H. Anzt, A. Bouteiller, A. Danalis, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, et al., Roadmap for the Development of a Linear Algebra Library for Exascale Computing: SLATE: Software for Linear Algebra Targeting Exascale”, SLATE Working Notes, no. 1, ICL-UT-17-02: Innovative Computing Laboratory, University of Tennessee, June 2017.  (2.44 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs”, International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.  (626.21 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Novel HPC Techniques to Batch Execution of Many Variable Size BLAS Computations on GPUs”, International Conference on Supercomputing (ICS '17), Chicago, Illinois, ACM, June 2017.
Agarwal, P., R. A.. Alexander, E.. Apra, S. Balay, A. S. Bland, J. Colgan, E. D'Azevedo, J. Dongarra, T. Dunigan, M. Fahey, et al., Cray X1 Evaluation Status Report”, Oak Ridge National Laboratory Report, vol. /-2004/13, January 2004.  (817.33 KB)
Agrawal, S., J. Dongarra, K. Seymour, and S. Vadhiyar, NetSolve: Past, Present, and Future - A Look at a Grid Enabled Server”, Making the Global Infrastructure a Reality: Wiley Publishing, 00-2003.  (158.19 KB)
Agrawal, S., Hardware Software Server in NetSolve”, ICL Technical Report, no. ICL-UT-02-02, January 2002.  (221.4 KB)
Agrawal, S., D. Arnold, S. Blackford, J. Dongarra, M. Miller, K. Sagi, Z. Shi, K. Seymour, and S. Vadhiyar, Users' Guide to NetSolve v1.4.1”, ICL Technical Report, no. ICL-UT-02-05, June 2002.  (328.01 KB)
Aguilera, G., P. J. Teller, M. Taufer, and F. Wolf, A Systematic Multi-step Methodology for Performance Analysis of Communication Traces of Distributed Applications based on Hierarchical Clustering”, Proc. of the 5th International Workshop on Performance Modeling, Evaluation, and Organization of Parallel and Distributed Systems (PMEO-PDS 2006), no. ICL-UT-05-06, Rhodes Island, Greece, IEEE Computer Society, April 2006.  (1.02 MB)
Agullo, E., C. Augonnet, J. Dongarra, H. Ltaeif, R. Namyst, S. Thibault, and S. Tomov, Faster, Cheaper, Better - a Hybridization Methodology to Develop Linear Algebra Software for GPUs”, LAPACK Working Note, no. 230, 00-2010.  (334.48 KB)
Agullo, E., J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaeif, P. Luszczek, and S. Tomov, Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects”, Journal of Physics: Conference Series, vol. 180, 00-2009.  (119.37 KB)
Agullo, E., L. Giraud, A. Guermouche, A. Haidar, and J. Roman, Parallel algebraic domain decomposition solver for the solution of augmented systems.”, Parallel, Distributed, Grid and Cloud Computing for Engineering, Ajaccio, Corsica, France, 12-15 April, 00-2011.
Agullo, E., L. Giraud, A. Guermouche, A. Haidar, and J. Roman, Towards a Complexity Analysis of Sparse Hybrid Linear Solvers”, PARA 2010, Reykjavik, Iceland, June 2010.
Agullo, E., C. Augonnet, J. Dongarra, M. Faverge, J. Langou, H. Ltaeif, and S. Tomov, LU Factorization for Accelerator-based Systems”, IEEE/ACS AICCSA 2011, Sharm-El-Sheikh, Egypt, December 2011.  (234.86 KB)
Agullo, E., L. Giraud, A. Guermouche, A. Haidar, S. Lanteri, and J. Roman, Algebraic Schwarz Preconditioning for the Schur Complement: Application to the Time-Harmonic Maxwell Equations Discretized by a Discontinuous Galerkin Method.”, The Twentieth International Conference on Domain Decomposition Methods, La Jolla, California, February 2011.
Agullo, E., C. Augonnet, J. Dongarra, M. Faverge, H. Ltaeif, S. Thibault, and S. Tomov, QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators”, Proceedings of IPDPS 2011, no. ICL-UT-10-04, Anchorage, AK, October 2010.  (468.17 KB)
Agullo, E., G. Bosilca, C. Castagnède, J. Dongarra, H. Ltaeif, and S. Tomov, Matrices Over Runtime Systems at Exascale”, Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.
Agullo, E., C. Coti, J. Dongarra, T. Herault, and J. Langou, QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment”, 24th IEEE International Parallel and Distributed Processing Symposium (also LAWN 224), Atlanta, GA, April 2010.  (261.55 KB)
Agullo, E., B. Hadri, H. Ltaeif, and J. Dongarra, Comparative Study of One-Sided Factorizations with Multiple Software Packages on Multi-Core Hardware”, 2009 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09) (to appear), 00-2009.  (515.63 KB)
Agullo, E., C. Augonnet, J. Dongarra, H. Ltaeif, R. Namyst, S. Thibault, and S. Tomov, A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs”, in GPU Computing Gems, Jade Edition, vol. 2: Elsevier, pp. 473-484, 00-2011.
Agullo, E., C. Coti, T. Herault, J. Langou, S. Peyronnet, A.. Rezmerita, F. Cappello, and J. Dongarra, QCG-OMPI: MPI Applications on Grids.”, Future Generation Computer Systems, vol. 27, no. 4, pp. 435-369, January 2011.  (1.48 MB)
Agullo, E., L. Giraud, A. Guermouche, A. Haidar, J. Roman, and Y. Lee-Tin-Yien, MaPHyS or the Development of a Parallel Algebraic Domain Decomposition Solver in the Course of the Solstice Project”, Sparse Days 2010 Meeting at CERFACS, Toulouse, France, June 2010.
Agullo, E., C. Coti, T. Herault, J. Langou, S. Peyronnet, A.. Rezmerita, F. Cappello, and J. Dongarra, QCG-OMPI: MPI Applications on Grids”, Future Generation Computer Systems, vol. 27, no. 4, pp. 357-369, March 2010.  (1.48 MB)
Alam, S., R. F. Barrett, H. Jagode, J. A.. Kuehn, S. W. Poole, and R.. Sankaran, Impact of Quad-core Cray XT4 System and Software Stack on Scientific Computation”, Euro-Par 2009, Lecture Notes in Computer Science, vol. 5704/2009, Delft, The Netherlands, Springer Berlin / Heidelberg, pp. 334-344, August 2009.  (312.74 KB)
Aliaga, J. I., H. Anzt, M. Castillo, J. C. Fernández, G. León, J. Pérez, and E. S. Quintana-Ortí, Unveiling the Performance-energy Trade-off in Iterative Linear System Solvers for Multithreaded Processors”, Concurrency and Computation: Practice and Experience, vol. 27, issue 4, pp. 885-904, September 2014.  (1.83 MB)
Computational Science – ICCS 2009, Proceedings of the 9th International Conference”, Lecture Notes in Computer Science: Theoretical Computer Science and General Issues, vol. -, no. 5544-5545, Baton Rouge, LA, May 2009.
Alvaro, W., J. Kurzak, and J. Dongarra, Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the CELL Processor”, University of Tennessee Computer Science Technical Report, no. UT-CS-08-609, (also LAPACK Working Note 189), January 2008.  (500.99 KB)
Alvaro, W., J. Kurzak, and J. Dongarra, Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture - CELL Processor”, Parallel Computing, vol. 35, pp. 138-150, 00-2009.  (591.16 KB)
Anderson, E., Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, et al., LAPACK Users' Guide, 3rd ed.”, Philadelphia: Society for Industrial and Applied Mathematics, January 1999.
Andersson, U., and P. Mucci, Analysis and Optimization of Yee_Bench using Hardware Performance Counters”, Proceedings of Parallel Computing 2005 (ParCo) (to appear), Malaga, Spain, January 2005.  (72.27 KB)
Angskun, T., G. Bosilca, and J. Dongarra, Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology”, Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07), Niagara Falls, Canada, Springer, August 2007.  (480.47 KB)
Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac–Grbovic, and J. Dongarra, Scalable Fault Tolerant Protocol for Parallel Runtime Environments”, 2006 Euro PVM/MPI, no. ICL-UT-06-12, Bonn, Germany, 00-2006.  (149.07 KB)
Angskun, T., G. Bosilca, G. Fagg, J. Pjesivac–Grbovic, and J. Dongarra, Reliability Analysis of Self-Healing Network using Discrete-Event Simulation”, Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07): IEEE Computer Society, pp. 437-444, May 2007.
Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac–Grbovic, and J. Dongarra, Self-Healing Network for Scalable Fault-Tolerant Runtime Environments”, Future Generation Computer Systems, vol. 26, no. 3, pp. 479-485, March 2010.  (1.54 MB)
Angskun, T., G. Bosilca, B. Vander Zanden, and J. Dongarra, Optimal Routing in Binomial Graph Networks”, The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT), Adelaide, Australia, IEEE Computer Society, December 2007.
Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac–Grbovic, and J. Dongarra, Self-Healing Network for Scalable Fault Tolerant Runtime Environments”, DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems, Innsbruck, Austria, January 2006.  (162.83 KB)
Angskun, T., G. Bosilca, and J. Dongarra, Self-Healing in Binomial Graph Networks”, 2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007), Vilamoura, Algarve, Portugal, November 2007.  (322.39 KB)
Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, Efficiency of General Krylov Methods on GPUs – An Experimental Study”, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683-691, May 2016.

Pages

Modify or remove your filters and try again.