Publications

Export 1016 results:
Journal Article
Luszczek, P., J. Kurzak, I. Yamazaki, D. Keffer, V. Maroulas, and J. Dongarra, Autotuning Techniques for Performance-Portable Point Set Registration in 3D,” Supercomputing Frontiers and Innovations, vol. 5, no. 4, December 2018. DOI: 10.14529/jsfi180404  (720.15 KB)
Dongarra, J., M. Gates, J. Kurzak, P. Luszczek, and Y. Tsai, Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2040–2055, November 2018. DOI: 10.1109/JPROC.2018.2868961
Balaprakash, P., J. Dongarra, T. Gamblin, M. Hall, J. Hollingsworth, B. Norris, and R. Vuduc, Autotuning in High-Performance Computing Applications,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2068–2083, November 2018. DOI: 10.1109/JPROC.2018.2841200
You, H., B. Rekapalli, Q. Liu, and S. Moore, Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,” TeraGrid'11, Salt Lake City, Utah, July 2011.  (275.34 KB)
Seymour, K., and J. Dongarra, Automatic Translation of Fortran to JVM Bytecode,” Concurrency and Computation: Practice and Experience, vol. 15, no. 3-5, pp. 202-207, 00 2003.  (185.8 KB)
Wolf, F., and B. Mohr, Automatic performance analysis of hybrid MPI/OpenMP applications,” Journal of Systems Architecture, Special Issue 'Evolutions in parallel distributed and network-based processing', vol. 49(10-11): Elsevier, pp. 421-439, November 2003.
Cuenca, J., D. Giminez, J. González, J. Dongarra, and K. Roche, Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load,” EuroPar 2002, Paderborn, Germany, August 2002.  (92.59 KB)
Wolf, F., B. Mohr, J. Dongarra, and S. Moore, Automatic analysis of inefficiency patterns in parallel applications,” Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted), 00 2005.  (233.31 KB)
Wolf, F., B. Mohr, J. Dongarra, and S. Moore, Automatic Analysis of Inefficiency Patterns in Parallel Applications,” Concurrency and Computation: Practice and Experience, vol. 19, no. 11, pp. 1481-1496, August 2007.  (233.31 KB)
Whaley, C., A. Petitet, and J. Dongarra, Automated Empirical Optimization of Software and the ATLAS Project,” Parallel Computing, vol. 27, no. 1-2, pp. 3-25, January 2001.  (370.71 KB)
Berry, M., and J. Dongarra, Atlanta Organizers Put Mathematics to Work For the Math Sciences Community,” SIAM News, vol. 32, no. 6, January 1999.  (45.98 KB)
Herrmann, J., G. Bosilca, T. Herault, L. Marchal, Y. Robert, and J. Dongarra, Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results,” Parallel Computing, vol. 52, pp. 22-41, February 2016. DOI: doi:10.1016/j.parco.2015.09.005  (2.06 MB)
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Assessing General-purpose Algorithms to Cope with Fail-stop and Silent Errors,” ACM Transactions on Parallel Computing, August 2016. DOI: 10.1145/2897189  (573.71 KB)
Seo, S., A. Amer, P. Balaji, C. Bordage, G. Bosilca, A. Brooks, P. Carns, A. Castello, D. Genet, T. Herault, et al., Argobots: A Lightweight Low-Level Threading and Tasking Framework,” IEEE Transactions on Parallel and Distributed Systems, October 2017. DOI: 10.1109/TPDS.2017.2766062
Bhowmick, S., V. Eijkhout, Y. Freund, E. Fuentes, and D. Keyes, Application of Machine Learning to the Selection of Sparse Linear Solvers,” International Journal of High Performance Computing Applications (submitted), 00 2006.  (392.96 KB)
Nelson, J., Analyzing PAPI Performance on Virtual Machines,” VMWare Technical Journal, vol. Winter 2013, January 2014.
Song, F., S. Moore, and J. Dongarra, Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems,” IEEE Cluster 2009, New Orleans, August 2009.  (395.53 KB)
Haidar, A., H. Ltaeif, A. YarKhan, and J. Dongarra, Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,” Submitted to Concurrency and Computations: Practice and Experience, November 2010.  (1.65 MB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 12, pp. 2700–2712, December 2018. DOI: 10.1109/TPDS.2018.2842785
Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Parallel Computing, vol. 81, pp. 1–21, January 2019. DOI: 10.1016/j.parco.2018.10.003  (3.27 MB)
Petitet, A., and J. Dongarra, Algorithmic Redistribution Methods for Block Cyclic Decompositions,” IEEE Transactions on Parallel and Distributed Computing, vol. 10, no. 12, pp. 201-220, October 2002.  (524.82 KB)
Boulet, P., J. Dongarra, F. Rastello, Y. Robert, and F. Vivien, Algorithmic Issues on Heterogeneous Computing Platforms,” Parallel Processing Letters, vol. 9, no. 2, pp. 197-213, January 1999.  (301.17 KB)
Dongarra, J., G. Bosilca, R. Delmas, and J. Langou, Algorithmic Based Fault Tolerance Applied to High Performance Computing,” Journal of Parallel and Distributed Computing, vol. 69, pp. 410-416, 00 2009.  (313.55 KB)
Chen, Z., and J. Dongarra, Algorithm-Based Fault Tolerance for Fail-Stop Failures,” IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 12, January 2008.  (340.49 KB)
Bouteiller, A., T. Herault, G. Bosilca, P. Du, and J. Dongarra, Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy,” ACM Transactions on Parallel Computing, vol. 1, issue 2, no. 10, pp. 10:1-10:28, January 2015. DOI: 10.1145/2686892  (1.14 MB)
Casanova, H., M H. Kim, J. Plank, and J. Dongarra, Adaptive Scheduling for Task Farming with Grid Middleware,” International Journal of Supercomputer Applications and High-Performance Computing, vol. 13, no. 3, pp. 231-240, October 2002.  (461.08 KB)
Anzt, H., J. Dongarra, G. Flegar, N. J. Higham, and E. S. Quintana-Ortí, Adaptive Precision in Block‐Jacobi Preconditioning for Iterative Sparse Linear System Solvers,” Concurrency Computation: Practice and Experience, March 2018. DOI: 10.1002/cpe.4460
Anzt, H., J. Dongarra, G. Flegar, N. J. Higham, and E. S. Quintana-Orti, Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers,” Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, 2019. DOI: 10.1002/cpe.4460  (341.54 KB)
Moore, S., A.J.. Baker, J. Dongarra, C. Halloy, and C. Ng, Active Netlib: An Active Mathematical Software Collection for Inquiry-based Computational Science and Engineering Education,” Journal of Digital Information special issue on Interactivity in Digital Libraries, vol. 2, no. 4, 00 2002.  (182.59 KB)
Dongarra, J., M. Faverge, H. Ltaeif, and P. Luszczek, Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting,” Concurrency and Computation: Practice and Experience, vol. 26, issue 7, pp. 1408-1431, May 2014. DOI: 10.1002/cpe.3110  (1.96 MB)
Anzt, H., W. Sawyer, S. Tomov, P. Luszczek, and J. Dongarra, Acceleration of GPU-based Krylov solvers via Data Transfer Reduction,” International Journal of High Performance Computing Applications, 2015.
Demmel, J., J. Dongarra, A. Fox, S. Williams, V. Volkov, and K. Yelick, Accelerating Time-To-Solution for Computational Science and Engineering,” SciDAC Review, 00 2009.  (739.11 KB)
Gates, M., S. Tomov, and J. Dongarra, Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs,” Parallel Computing, vol. 74, pp. 3–18, May 2018. DOI: 10.1016/j.parco.2017.10.004
Dong, T., A. Haidar, S. Tomov, and J. Dongarra, Accelerating the SVD Bi-Diagonalization of a Batch of Small Matrices using GPUs,” Journal of Computational Science, vol. 26, pp. 237–245, May 2018. DOI: 10.1016/j.jocs.2018.01.007  (2.18 MB)
Tomov, S., R. Nath, and J. Dongarra, Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms through Hybrid GPU-Based Computing,” Parallel Computing, vol. 36, no. 12, pp. 645-654, 00 2010.  (1.39 MB)
Anzt, H., M. Baboulin, J. Dongarra, Y. Fournier, F. Hulsemann, A. Khabou, and Y. Wang, Accelerating the Conjugate Gradient Algorithm with GPU in CFD Simulations,” VECPAR, 2016.
Jagode, H., A. Danalis, and J. Dongarra, Accelerating NWChem Coupled Cluster through Dataflow-Based Execution,” The International Journal of High Performance Computing Applications, pp. 1–13, January 2017. DOI: 10.1177/1094342016672543  (4.07 MB)
Jagode, H., A. Danalis, and J. Dongarra, Accelerating NWChem Coupled Cluster through dataflow-based Execution,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540--551, July 2018. DOI: 10.1177/1094342016672543  (1.68 MB)
Baboulin, M., J. Dongarra, J. Herrmann, and S. Tomov, Accelerating Linear System Solutions Using Randomization Techniques,” ACM Transactions on Mathematical Software (also LAWN 246), vol. 39, issue 2, February 2013. DOI: 10.1145/2427023.2427025  (358.79 KB)
Baboulin, M., J. Dongarra, J. Herrmann, and S. Tomov, Accelerating Linear System Solutions Using Randomization Techniques,” INRIA RR-7616 / LAWN #246 (presented at International AMMCS’11), Waterloo, Ontario, Canada, July 2011.  (358.79 KB)
Nath, R., S. Tomov, and J. Dongarra, Accelerating GPU Kernels for Dense Linear Algebra,” Proc. of VECPAR'10, Berkeley, CA, June 2010.  (615.07 KB)
Dongarra, J., V. Getov, and K. Walsh, The 30th Anniversary of the Supercomputing Conference: Bringing the Future Closer—Supercomputing History and the Immortality of Now,” Computer, vol. 51, issue 10, pp. 74–85, November 2018. DOI: 10.1109/MC.2018.3971352
,” 15th European PVM/MPI Users' Group Meeting, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, vol. 5205, Dublin Ireland, Springer Berlin, January 2008.
Conference Proceedings
Haidar, A., Y. Jia, P. Luszczek, S. Tomov, A. YarKhan, and J. Dongarra, Weighted Dynamic Scheduling with Many Parallelism Grains for Offloading of Numerical Workloads to Multiple Varied Accelerators,” Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA'15), vol. No. 5, Austin, TX, ACM, November 2015.  (347.6 KB)
Anzt, H., S. Tomov, J. Dongarra, and V. Heuveline, Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems,” Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Best Paper), Rhodes Island, Greece, August 2012.  (764.02 KB)
Fürlinger, K., and S. Moore, Visualizing the Program Execution Control Flow of OpenMP Applications,” Proc. 4th International Workshop on OpenMP (IWOMP 2008), West Lafayette, Indiana, Lecture Notes in Computer Science 5004, pp. 181-190, January 2008.  (194.25 KB)
Ramakrishan, L., D. Nurmi, A. Mandal, C. Koelbel, D. Gannon, M. Huang, Y-S. Kee, G. Obertelli, K. Thyagaraja, R. Wolski, et al., VGrADS: Enabling e-Science Workflows on Grids and Clouds with Fault Tolerance,” SC’09 The International Conference for High Performance Computing, Networking, Storage and Analysis (to appear), Portland, OR, 00 2009.  (648.82 KB)
Anzt, H., J. Dongarra, G. Flegar, E. S. Quintana-Ortí, and A. E. Thomas, Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning,” International Conference on Computational Science (ICCS 2017), vol. 108, Zurich, Switzerland, Procedia Computer Science, pp. 1783-1792, June 2017.
Fürlinger, K., J. Dongarra, and M. Gerndt, On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications,” Proceedings of the 13th International Euro-Par Conference on Parallel Processing (Euro-Par '07), Rennes, France, Springer LNCS, January 2007.

Pages