Publications

Export 90 results:
Filters: First Letter Of Title is A  [Clear All Filters]
2018
Jagode, H., A. Danalis, and J. Dongarra, Accelerating NWChem Coupled Cluster through dataflow-based Execution,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540--551, July 2018.  (1.68 MB)
Dong, T., A. Haidar, S. Tomov, and J. Dongarra, Accelerating the SVD Bi-Diagonalization of a Batch of Small Matrices using GPUs,” Journal of Computational Science, vol. 26, pp. 237–245, May 2018.
Gates, M., S. Tomov, and J. Dongarra, Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs,” Parallel Computing, vol. 74, pp. 3–18, May 2018.
Luo, X., W. Wu, G. Bosilca, T. Patinyasakdikul, L. Wang, and J. Dongarra, ADAPT: An Event-Based Adaptive Collective Communication Framework,” The 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18), Tempe, Arizona, ACM Press, June 2018.  (493.65 KB)
Anzt, H., J. Dongarra, G. Flegar, N. J. Higham, and E. S. Quintana-Ortí, Adaptive Precision in Block‐Jacobi Preconditioning for Iterative Sparse Linear System Solvers,” Concurrency Computation: Practice and Experience, March 2018.
Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.  (3.74 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 12, pp. 2700–2712, December 2018.
Yamazaki, I., A. Abdelfattah, A. Ida, S. Ohshima, S. Tomov, R. Yokota, and J. Dongarra, Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU Clusters,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada, IEEE, May 2018.  (1.37 MB)
Balaprakash, P., J. Dongarra, T. Gamblin, M. Hall, J. Hollingsworth, B. Norris, and R. Vuduc, Autotuning in High-Performance Computing Applications,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2068–2083, November 2018.
Dongarra, J., M. Gates, J. Kurzak, P. Luszczek, and Y. Tsai, Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2040–2055, November 2018.
Luszczek, P., J. Kurzak, I. Yamazaki, D. Keffer, V. Maroulas, and J. Dongarra, Autotuning Techniques for Performance-Portable Point Set Registration in 3D,” Supercomputing Frontiers and Innovations, vol. 5, no. 4, December 2018.  (720.15 KB)
2017
Jagode, H., A. Danalis, and J. Dongarra, Accelerating NWChem Coupled Cluster through Dataflow-Based Execution,” The International Journal of High Performance Computing Applications, pp. 1–13, January 2017.  (4.07 MB)
Seo, S., A. Amer, P. Balaji, C. Bordage, G. Bosilca, A. Brooks, P. Carns, A. Castello, D. Genet, T. Herault, et al., Argobots: A Lightweight Low-Level Threading and Tasking Framework,” IEEE Transactions on Parallel and Distributed Systems, October 2017.
Aupy, G., Y. Robert, and F. Vivien, Assuming failure independence: are we right to be wrong?,” The 3rd International Workshop on Fault Tolerant Systems (FTS), Honolulu, Hawaii, IEEE, September 2017.  (597.11 KB)
Gates, M., J. Kurzak, P. Luszczek, Y. Pei, and J. Dongarra, Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices,” Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL, IEEE, June 2017.
2015
Gates, M., H. Anzt, J. Kurzak, and J. Dongarra, Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs,” 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015.  (1.02 MB)
Jagode, H., A. Danalis, G. Bosilca, and J. Dongarra, Accelerating NWChem Coupled Cluster through dataflow-based Execution,” 11th International Conference on Parallel Processing and Applied Mathematics (PPAM 2015), Krakow, Poland, Springer International Publishing, September 2015.  (452.82 KB)
Anzt, H., S. Tomov, and J. Dongarra, Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” Spring Simulation Multi-Conference 2015 (SpringSim'15), Alexandria, VA, SCS, April 2015.  (1.46 MB)
Anzt, H., W. Sawyer, S. Tomov, P. Luszczek, and J. Dongarra, Acceleration of GPU-based Krylov solvers via Data Transfer Reduction,” International Journal of High Performance Computing Applications, 2015.
Anzt, H., J. Dongarra, and E. S. Quintana-Ortí, Adaptive Precision Solvers for Sparse Linear Systems,” 3rd International Workshop on Energy Efficient Supercomputing (E2SC '15), Austin, TX, ACM, November 2015.
Bouteiller, A., T. Herault, G. Bosilca, P. Du, and J. Dongarra, Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy,” ACM Transactions on Parallel Computing, vol. 1, issue 2, no. 10, pp. 10:1-10:28, January 2015.  (1.14 MB)
Chow, E., H. Anzt, and J. Dongarra, Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs,” International Supercomputing Conference (ISC 2015), Frankfurt, Germany, July 2015.
2014
Gates, M., A. Haidar, and J. Dongarra, Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem,” VECPAR 2014, Eugene, OR, June 2014.  (199.44 KB)
Dongarra, J., M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, Accelerating Numerical Dense Linear Algebra Calculations with GPUs,” Numerical Computations with GPUs: Springer International Publishing, pp. 3-28, 2014.  (1.06 MB)
Anzt, H., S. Tomov, and J. Dongarra, Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product,” University of Tennessee Computer Science Technical Report, no. UT-EECS-14-731: University of Tennessee, October 2014.  (1.83 MB)
Yamazaki, I., T. Mary, J. Kurzak, S. Tomov, and J. Dongarra, Access-averse Framework for Computing Low-rank Matrix Approximations,” First International Workshop on High Performance Big Graph Data Management, Analysis, and Mining, Washington, DC, October 2014.
Dongarra, J., M. Faverge, H. Ltaeif, and P. Luszczek, Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting,” Concurrency and Computation: Practice and Experience, vol. 26, issue 7, pp. 1408-1431, May 2014.  (1.96 MB)
Nelson, J., Analyzing PAPI Performance on Virtual Machines,” VMWare Technical Journal, vol. Winter 2013, January 2014.
Genet, D., A. Guermouche, and G. Bosilca, Assembly Operations for Multicore Architectures using Task-Based Runtime Systems,” Euro-Par 2014, Porto, Portugal, Springer International Publishing, August 2014.  (481.52 KB)
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the Impact of ABFT and Checkpoint Composite Strategies,” 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.02 MB)
2013
Nelson, J., Analyzing PAPI Performance on Virtual Machines,” ICL Technical Report, no. ICL-UT-13-02, August 2013.  (437.37 KB)
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the impact of ABFT and Checkpoint composite strategies,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.  (968.47 KB)
2012
Baboulin, M., J. Dongarra, J. Herrmann, and S. Tomov, Accelerating Linear System Solutions Using Randomization Techniques,” ACM Transactions on Mathematical Software (accepted) (also LAWN 246), vol. 39, issue 2, March 2012.  (358.79 KB)
Dong, T., T. Kolev, R. Rieben, V. Dobrev, S. Tomov, and J. Dongarra, Acceleration of the BLAST Hydro Code on GPU,” Supercomputing '12 (poster), Salt Lake City, Utah, SC12, November 2012.
Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, Algorithm-Based Fault Tolerance for Dense Matrix Factorization,” Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, ACM, pp. 225-234, February 2012.  (865.79 KB)
Donfack, S., J. Dongarra, M. Faverge, M. Gates, J. Kurzak, P. Luszczek, and I. Yamazaki, On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,” University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.  (358.98 KB)
Luszczek, P., and J. Dongarra, Anatomy of a Globally Recursive Embedded LINPACK Benchmark,” 2012 IEEE High Performance Extreme Computing Conference, Waltham, MA, pp. 1-6, September 2012.  (204.74 KB)
2011
Baboulin, M., J. Dongarra, J. Herrmann, and S. Tomov, Accelerating Linear System Solutions Using Randomization Techniques,” INRIA RR-7616 / LAWN #246 (presented at International AMMCS’11), Waterloo, Ontario, Canada, July 2011.  (358.79 KB)
Dongarra, J., M. Faverge, H. Ltaeif, and P. Luszczek, Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization,” University of Tennessee Computer Science Technical Report (also as a LAWN), no. ICL-UT-11-08, September 2011.  (618.53 KB)
Agullo, E., L. Giraud, A. Guermouche, A. Haidar, S. Lanteri, and J. Roman, Algebraic Schwarz Preconditioning for the Schur Complement: Application to the Time-Harmonic Maxwell Equations Discretized by a Discontinuous Galerkin Method.,” The Twentieth International Conference on Domain Decomposition Methods, La Jolla, California, February 2011.
Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, Algorithm-based Fault Tolerance for Dense Matrix Factorizations,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.  (865.79 KB)
Haidar, A., H. Ltaeif, A. YarKhan, and J. Dongarra, Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,” University of Tennessee Computer Science Technical Report, UT-CS-11-666, (also Lawn 243), 00-2011.  (1.65 MB)
You, H., B. Rekapalli, Q. Liu, and S. Moore, Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,” TeraGrid'11, Salt Lake City, Utah, July 2011.  (275.34 KB)
Kurzak, J., S. Tomov, and J. Dongarra, Autotuning GEMMs for Fermi,” University of Tennessee Computer Science Technical Report, UT-CS-11-671, (also Lawn 245), April 2011.  (397.45 KB)
2010
Nath, R., S. Tomov, and J. Dongarra, Accelerating GPU Kernels for Dense Linear Algebra,” Proc. of VECPAR'10, Berkeley, CA, June 2010.  (615.07 KB)

Pages