Publications

Export 1029 results:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
G
Gates, M., A. Charara, J. Kurzak, D. Sukkari, A. YarKhan, and J. Dongarra, SLATE Working Note 13: Implementing Singular Value and Symmetric Eigenvalue Solvers,” SLATE Working Notes, no. 13, ICL-UT-19-07: Innovative Computing Laboratory, University of Tennessee, September 2019.  (4.45 MB)
Gates, M., MAGMA Tutorial , Atlanta, GA, Keeneland Workshop, February 2012.  (2.47 MB)
Gates, M., J. Kurzak, P. Luszczek, Y. Pei, and J. Dongarra, Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices,” Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL, IEEE, June 2017. DOI: 10.1109/IPDPSW.2017.18
Gates, M., A. Haidar, and J. Dongarra, Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem,” VECPAR 2014, Eugene, OR, June 2014.  (199.44 KB)
Gates, M., S. Tomov, and A. Haidar, Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.  (4.7 MB)
Gates, M., P. Luszczek, A. Abdelfattah, J. Kurzak, J. Dongarra, K. Arturov, C. Cecka, and C. Freitag, C++ API for BLAS and LAPACK,” SLATE Working Notes, no. 2, ICL-UT-17-03: Innovative Computing Laboratory, University of Tennessee, June 2017.  (1.12 MB)
Gates, M., S. Tomov, and J. Dongarra, Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs,” Parallel Computing, vol. 74, pp. 3–18, May 2018. DOI: 10.1016/j.parco.2017.10.004
Gates, M., H. Anzt, J. Kurzak, and J. Dongarra, Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs,” 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015.  (1.02 MB)
Gates, M., A. Charara, J. Kurzak, and J. Dongarra, SLATE Users' Guide,” SLATE Working Notes, no. 10, ICL-UT-19-01: Innovative Computing Laboratory, University of Tennessee, January 2019.
Genet, D., A. Guermouche, and G. Bosilca, Assembly Operations for Multicore Architectures using Task-Based Runtime Systems,” Euro-Par 2014, Porto, Portugal, Springer International Publishing, August 2014.  (481.52 KB)
Gerndt, M., and K. Fürlinger, Specification and detection of performance problems with ASL,” Concurrency and Computation: Practice and Experience, vol. 19, no. 11: John Wiley and Sons Ltd., pp. 1451-1464, January 2007.
Ghysels, P., S. Li, A. YarKhan, and J. Dongarra, Initial Integration and Evaluation of SLATE and STRUMPACK,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-11: University of Tennessee, December 2018.  (249.78 KB)
Giraud, L., A. Haidar, and Y. Saad, Sparse approximations of the Schur complement for parallel algebraic hybrid solvers in 3D,” Numerical Mathematics: Theory, Methods and Applications, vol. 3, no. 3, Beijing, Golbal Science Press, pp. 64-82, 00 2010.
Giraud, L., A. Haidar, and S. Pralet, Using multiple levels of parallelism to enhance the performance of domain decomposition solvers,” Parallel Computing, vol. 36, no. 5-6: Elsevier journals, pp. 285-296, 00 2010.  (418.57 KB)
Giraud, L., J. Langou, and G.. Sylvand, On the Parallel Solution of Large Industrial Wave Propagation Problems,” Journal of Computational Acoustics (to appear), January 2005.  (1.08 MB)
Giraud, L., J. Langou, M. Rozložník, and J. van den Eshof, Rounding Error Analysis of the Classical Gram-Schmidt Orthogonalization Process,” Numerische Mathematik, vol. 101, no. 1, pp. 87-100, January 2005.  (157.48 KB)
Graham, R. L., G. M. Shipman, B. Barrett, R. Castain, G. Bosilca, and A. Lumsdaine, A High-Performance, Heterogeneous MPI,” HeteroPar 2006, Barcelona, Spain, September 2006.  (193.73 KB)
Graham, R. L., G. Bosilca, and J. Pjesivac–Grbovic, A Comparison of Application Performance Using Open MPI and Cray MPI,” Cray User Group, CUG 2007, May 2007.  (248.83 KB)
Graham, R. L., R. Brightwell, B. Barrett, G. Bosilca, and J. Pjesivac–Grbovic, An Evaluation of Open MPI's Matching Transport Layer on the Cray XT,” EuroPVM/MPI 2007, September 2007.  (369.01 KB)
Gruetzmacher, T., T. Cojean, G. Flegar, F. Göbel, and H. Anzt, A Customized Precision Format Based on Mantissa Segmentation for Accelerating Sparse Linear Algebra,” Concurrency and Computation: Practice and Experience, vol. 40319, issue 262, January 2019. DOI: 10.1002/cpe.5418
Guidry, M., and A. Haidar, On the Design, Autotuning, and Optimization of GPU Kernels for Kinetic Network Simulations Using Fast Explicit Integration and GPU Batched Computation , Oak Ridge, TN, Joint Institute for Computational Sciences Seminar Series, Presentation, September 2015.  (17.25 MB)
Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, April 2010.  (896.03 KB)
Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, Rectangular Full Packed Format for Cholesky’s Algorithm: Factorization, Solution, and Inversion,” ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, Atlanta, GA, April 2010.  (896.03 KB)
Gustavson, F. G., J. Wasniewski, and J. Dongarra, Level-3 Cholesky Kernel Subroutine of a Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm,” ACM TOMS (submitted), also LAPACK Working Note (LAWN) 211, 00 2010.  (190.2 KB)
Gustavson, F. G., J. Wasniewski, and J. Dongarra, Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” University of Tennessee Computer Science Technical Report, UT-CS-08-614 (also LAPACK Working Note 199), April 2008.  (896.03 KB)
Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” ACM TOMS (to appear), 00 2009.  (896.03 KB)
Gustavson, F. G., J. Wasniewski, J. Dongarra, J. Herrero, and J. Langou, Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms,” ACM Transactions on Mathematical Software (TOMS), vol. 39, issue 2, February 2013. DOI: 10.1145/2427023.2427026  (439.46 KB)
H
Hadri, B., H. Ltaeif, E. Agullo, and J. Dongarra, Enhancing Parallelism of Tile QR Factorization for Multicore Architectures,” Submitted to Transaction on Parallel and Distributed Systems, December 2009.  (464.23 KB)
Hadri, B., H. Ltaeif, E. Agullo, and J. Dongarra, Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures,” Innovative Computing Laboratory Technical Report (also LAPACK Working Note 222 and CS Tech Report UT-CS-09-645), no. ICL-UT-09-03, September 2009.  (464.23 KB)
Hadri, B., E. Agullo, and J. Dongarra, Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,” 24th IEEE International Parallel and Distributed Processing Symposium (submitted), 00 2010.  (313.98 KB)
Hadri, B., H. Ltaeif, E. Agullo, and J. Dongarra, Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,” accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, December 2009.
Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers and Achieve 74 Gflops/Watt on Nvidia V100 , San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.  (2.96 MB)
Haidar, A., S. Tomov, J. Dongarra, and N. J. Higham, Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers,” The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, TX, IEEE, November 2018.
Haidar, A., A. Abdelfattah, M. Zounon, S. Tomov, and J. Dongarra, A Guide for Achieving High Performance with Very Small Matrices on GPUs: A Case Study of Batched LU and Cholesky Factorizations,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 5, pp. 973–984, May 2018. DOI: 10.1109/TPDS.2017.2783929  (832.92 KB)
Haidar, A., T. Dong, S. Tomov, P. Luszczek, and J. Dongarra, Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,” ISC High Performance, Frankfurt, Germany, Springer, July 2015.  (778.26 KB)
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Towards Batched Linear Solvers on Accelerated Hardware Platforms,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8) co-located with PPOPP 2015, San Francisco, CA, ACM, February 2015.  (403.74 KB)
Haidar, A., L. Giraud, H. Ben-Hadj-Ali, F. Sourbier, S. Operto, and J. Virieux, 3-D parallel frequency-domain visco-acoustic wave modelling based on a hybrid direct/iterative solver,” 73rd EAGE Conference & Exhibition incorporating SPE EUROPEC 2011, Vienna, Austria, 23-26 May, 00 2011.
Haidar, A., P. Luszczek, J. Kurzak, and J. Dongarra, An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,” Supercomputing 2013, Denver, CO, November 2013.
Haidar, A., H. Ltaeif, P. Luszczek, and J. Dongarra, A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction,” IPDPS 2012, Shanghai, China, May 2012.  (480.43 KB)
Haidar, A., A. Abdelfattah, M. Zounon, P. Wu, S. Pranesh, S. Tomov, and J. Dongarra, The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques,” International Conference on Computational Science (ICCS 2018), vol. 10860, Wuxi, China, Springer, pp. 586–600, June 2018. DOI: 10.1007/978-3-319-93698-7_45
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Efficient Eigensolver Algorithms on Accelerator Based Architectures,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.  (6.98 MB)
Haidar, A., T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, Optimization for Performance and Energy for Batched Matrix Computations on GPUs,” 8th Workshop on General Purpose Processing Using GPUs (GPGPU 8), San Francisco, CA, ACM, February 2015. DOI: 10.1145/2716282.2716288  (699.5 KB)
Haidar, A., P. Luszczek, and J. Dongarra, New Algorithm for Computing Eigenvectors of the Symmetric Eigenvalue Problem,” Workshop on Parallel and Distributed Scientific and Engineering Computing, IPDPS 2014 (Best Paper), Phoenix, AZ, IEEE, May 2014. DOI: 10.1109/IPDPSW.2014.130  (2.33 MB)
Haidar, A., K. Kabir, D. Fayad, S. Tomov, and J. Dongarra, Out of Memory SVD Solver for Big Data,” 2017 IEEE High Performance Extreme Computing Conference (HPEC'17), Waltham, MA, IEEE, September 2017.  (1.33 MB)
Haidar, A., C. Cao, J. Dongarra, P. Luszczek, and S. Tomov, Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.51 MB)
Haidar, A., Y. Jia, P. Luszczek, S. Tomov, A. YarKhan, and J. Dongarra, Weighted Dynamic Scheduling with Many Parallelism Grains for Offloading of Numerical Workloads to Multiple Varied Accelerators,” Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA'15), vol. No. 5, Austin, TX, ACM, November 2015.  (347.6 KB)
Haidar, A., B. Brock, S. Tomov, M. Guidry, J. Jay Billings, D. Shyles, and J. Dongarra, Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations,” 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16), Waltham, MA, IEEE, September 2016.  (480.29 KB)
Haidar, A., R. Solcà, M. Gates, S. Tomov, T. C. Schulthess, and J. Dongarra, A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,” International Journal of High Performance Computing Applications, vol. 28, issue 2, pp. 196-209, May 2014. DOI: 10.1177/1094342013502097  (1.74 MB)
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Batched Matrix Computations on Hardware Accelerators,” EuroMPI/Asia 2015 Workshop, Bordeaux, France, September 2015.  (589.05 KB)
Haidar, A., P. Luszczek, J. Kurzak, and J. Dongarra, An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,” University of Tennessee Computer Science Technical Report (also LAWN 283), no. ut-eecs-13-720: University of Tennessee, October 2013.  (1.23 MB)

Pages