Publications

Export 946 results:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
F
Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, Designing LU-QR hybrid solvers for performance and stability,” University of Tennessee Computer Science Technical Report (also LAWN 282), no. ut-eecs-13-719: University of Tennessee, October 2013.  (4.11 MB)
Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, Designing LU-QR Hybrid Solvers for Performance and Stability,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (4.2 MB)
Faverge, M., J. Langou, Y. Robert, and J. Dongarra, Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, IEEE, May 2017.  (328.15 KB)
Fayad, D., J. Kurzak, P. Luszczek, P. Wu, and J. Dongarra, The Case for Directive Programming for Accelerator Autotuner Optimization,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-07: University of Tennessee, October 2017.  (341.52 KB)
Fischer, M., and J. Dongarra, Experiences with Windows 95/NT as a Cluster Computing Platform for Parallel Computing,” Parallel and Distributed Computing Practices, Special Issue: Cluster Computing, vol. 2, no. 2: Nova Science Publishers, USA, pp. 119-128, February 1999.  (164.04 KB)
Fürlinger, K., and S. Moore, Continuous Runtime Profiling of OpenMP Applications,” Proceedings of the 2007 Conference on Parallel Computing (PARCO 2007), Juelich and Aachen, Germany, January 2007.  (408.01 KB)
Fürlinger, K., J. Dongarra, and M. Gerndt, On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications,” Proceedings of the 13th International Euro-Par Conference on Parallel Processing (Euro-Par '07), Rennes, France, Springer LNCS, January 2007.
Fürlinger, K., and S. Moore, Capturing and Analyzing the Execution Control Flow of OpenMP Applications,” International Journal of Parallel Programming, vol. 37, no. 3, pp. 266-276, 00-2009.
Fürlinger, K., and S. Moore, Visualizing the Program Execution Control Flow of OpenMP Applications,” Proc. 4th International Workshop on OpenMP (IWOMP 2008), West Lafayette, Indiana, Lecture Notes in Computer Science 5004, pp. 181-190, January 2008.  (194.25 KB)
Fürlinger, K., and S. Moore, Detection and Analysis of Iterative Behavior in Parallel Applications,” Proceedings of the 2008 International Conference on Computational Science (ICCS 2008), vol. 5103, Krakow, Poland, pp. 261-267, January 2008.  (141.02 KB)
Fürlinger, K., and S. Moore, Recording the Control Flow of Parallel Applications to Determine Iterative and Phase-Based Behavior,” Future Generation Computing Systems, vol. 26, pp. 162-166, 00-2009.
Fürlinger, K., and S. Moore, OpenMP-centric Performance Analysis of Hybrid Applications,” Proc. 2008 IEEE International Conference on Cluster Computing (CLUSTER 2008), Tsukuba, Japan, January 2008.  (218.63 KB)
Fürlinger, K., M. Gerndt, and J. Dongarra, Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors,” Proceedings of the 2007 International Conference on Computational Science (ICCS 2007), vol. 4487-4490, Beijing, China, Springer LNCS, pp. 815-822, 2007.  (145.84 KB)
G
Gabriel, E., G. Fagg, A. Bukovsky, T. Angskun, and J. Dongarra, A Fault-Tolerant Communication Library for Grid Environments,” 17th Annual ACM International Conference on Supercomputing (ICS'03) International Workshop on Grid Computing and e-Science, San Francisco, June 2003.  (377.14 KB)
Gabriel, E., G. Fagg, and J. Dongarra, Evaluating The Performance Of MPI-2 Dynamic Communicators And One-Sided Communication,” Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 10th European PVM/MPI User's Group Meeting, vol. 2840, Venice, Italy, Springer-Verlag, Berlin, pp. 88-97, September 2003.  (254.08 KB)
Gates, M., S. Tomov, and J. Dongarra, Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs,” Parallel Computing, vol. 74, pp. 3–18, May 2018.
Gates, M., A. Charara, J. Kurzak, and J. Dongarra, SLATE Users' Guide,” SLATE Working Notes, no. 10, ICL-UT-19-01: Innovative Computing Laboratory, University of Tennessee, January 2019.
Gates, M., A. Charara, J. Kurzak, A. YarKhan, I. Yamazaki, and J. Dongarra, Least Squares Performance Report,” SLATE Working Notes, no. 9, ICL-UT-18-10: Innovative Computing Laboratory, University of Tennessee, December 2018.  (1.76 MB)
Gates, M., H. Anzt, J. Kurzak, and J. Dongarra, Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs,” 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015.  (1.02 MB)
Gates, M., A. Haidar, and J. Dongarra, Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem,” VECPAR 2014, Eugene, OR, June 2014.  (199.44 KB)
Gates, M., J. Kurzak, P. Luszczek, Y. Pei, and J. Dongarra, Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices,” Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL, IEEE, June 2017.
Gates, M., P. Luszczek, A. Abdelfattah, J. Kurzak, J. Dongarra, K. Arturov, C. Cecka, and C. Freitag, C++ API for BLAS and LAPACK,” SLATE Working Notes, no. 2, ICL-UT-17-03: Innovative Computing Laboratory, University of Tennessee, June 2017.  (1.12 MB)
Gates, M., S. Tomov, and A. Haidar, Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.  (4.7 MB)
Genet, D., A. Guermouche, and G. Bosilca, Assembly Operations for Multicore Architectures using Task-Based Runtime Systems,” Euro-Par 2014, Porto, Portugal, Springer International Publishing, August 2014.  (481.52 KB)
Gerndt, M., and K. Fürlinger, Specification and detection of performance problems with ASL,” Concurrency and Computation: Practice and Experience, vol. 19, no. 11: John Wiley and Sons Ltd., pp. 1451-1464, January 2007.
Ghysels, P., S. Li, A. YarKhan, and J. Dongarra, Initial Integration and Evaluation of SLATE and STRUMPACK,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-11: University of Tennessee, December 2018.  (249.78 KB)
Giraud, L., J. Langou, and G.. Sylvand, On the Parallel Solution of Large Industrial Wave Propagation Problems,” Journal of Computational Acoustics (to appear), January 2005.  (1.08 MB)
Giraud, L., A. Haidar, and Y. Saad, Sparse approximations of the Schur complement for parallel algebraic hybrid solvers in 3D,” Numerical Mathematics: Theory, Methods and Applications, vol. 3, no. 3, Beijing, Golbal Science Press, pp. 64-82, 00-2010.
Giraud, L., J. Langou, M. Rozložník, and J. van den Eshof, Rounding Error Analysis of the Classical Gram-Schmidt Orthogonalization Process,” Numerische Mathematik, vol. 101, no. 1, pp. 87-100, January 2005.  (157.48 KB)
Giraud, L., A. Haidar, and S. Pralet, Using multiple levels of parallelism to enhance the performance of domain decomposition solvers,” Parallel Computing, vol. 36, no. 5-6: Elsevier journals, pp. 285-296, 00-2010.  (418.57 KB)
Graham, R. L., G. Bosilca, and J. Pjesivac–Grbovic, A Comparison of Application Performance Using Open MPI and Cray MPI,” Cray User Group, CUG 2007, May 2007.  (248.83 KB)
Graham, R. L., R. Brightwell, B. Barrett, G. Bosilca, and J. Pjesivac–Grbovic, An Evaluation of Open MPI's Matching Transport Layer on the Cray XT,” EuroPVM/MPI 2007, September 2007.  (369.01 KB)
Graham, R. L., G. M. Shipman, B. Barrett, R. Castain, G. Bosilca, and A. Lumsdaine, A High-Performance, Heterogeneous MPI,” HeteroPar 2006, Barcelona, Spain, September 2006.  (193.73 KB)
Gustavson, F. G., J. Wasniewski, and J. Dongarra, Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” University of Tennessee Computer Science Technical Report, UT-CS-08-614 (also LAPACK Working Note 199), April 2008.  (896.03 KB)
Gustavson, F. G., J. Wasniewski, and J. Dongarra, Level-3 Cholesky Kernel Subroutine of a Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm,” ACM TOMS (submitted), also LAPACK Working Note (LAWN) 211, 00-2010.  (190.2 KB)
Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, Rectangular Full Packed Format for Cholesky’s Algorithm: Factorization, Solution, and Inversion,” ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, Atlanta, GA, April 2010.  (896.03 KB)
Gustavson, F. G., J. Wasniewski, J. Dongarra, J. Herrero, and J. Langou, Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms,” ACM Transactions on Mathematical Software (TOMS), vol. 39, issue 2, February 2013.  (439.46 KB)
Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” ACM TOMS (to appear), 00-2009.  (896.03 KB)
Gustavson, F. G., J. Wasniewski, J. Dongarra, and J. Langou, Rectangular Full Packed Format for Cholesky's Algorithm: Factorization, Solution and Inversion,” ACM Transactions on Mathematical Software (TOMS), vol. 37, no. 2, April 2010.  (896.03 KB)
H
Hadri, B., E. Agullo, and J. Dongarra, Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,” 24th IEEE International Parallel and Distributed Processing Symposium (submitted), 00-2010.  (313.98 KB)
Hadri, B., H. Ltaeif, E. Agullo, and J. Dongarra, Tile QR Factorization with Parallel Panel Processing for Multicore Architectures,” accepted in 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, December 2009.
Hadri, B., H. Ltaeif, E. Agullo, and J. Dongarra, Tall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures,” Innovative Computing Laboratory Technical Report (also LAPACK Working Note 222 and CS Tech Report UT-CS-09-645), no. ICL-UT-09-03, September 2009.  (464.23 KB)
Hadri, B., H. Ltaeif, E. Agullo, and J. Dongarra, Enhancing Parallelism of Tile QR Factorization for Multicore Architectures,” Submitted to Transaction on Parallel and Distributed Systems, December 2009.  (464.23 KB)
Haidar, A., R. Solcà, M. Gates, S. Tomov, T. C. Schulthess, and J. Dongarra, A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,” International Journal of High Performance Computing Applications, vol. 28, issue 2, pp. 196-209, May 2014.  (1.74 MB)
Haidar, A., C. Cao, J. Dongarra, P. Luszczek, and S. Tomov, Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.51 MB)
Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, High-performance Cholesky Factorization for GPU-only Execution,” Proceedings of the General Purpose GPUs (GPGPU-10), Austin, TX, ACM, February 2017.  (872.18 KB)
Haidar, A., A. Abdelfattah, S. Tomov, and J. Dongarra, Batched Matrix Computations on Hardware Accelerators Based on GPUs,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.  (9.36 MB)
Haidar, A., A. Abdelfattah, M. Zounon, S. Tomov, and J. Dongarra, A Guide for Achieving High Performance with Very Small Matrices on GPUs: A Case Study of Batched LU and Cholesky Factorizations,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 5, pp. 973–984, May 2018.  (832.92 KB)
Haidar, A., P. Luszczek, J. Kurzak, and J. Dongarra, An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,” University of Tennessee Computer Science Technical Report (also LAWN 283), no. ut-eecs-13-720: University of Tennessee, October 2013.  (1.23 MB)
Haidar, A., S. Tomov, J. Dongarra, and N. J. Higham, Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers,” The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, TX, IEEE, November 2018.

Pages