Publications

Export 1031 results:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
E
Solcà, R., A. Kozhevnikov, A. Haidar, S. Tomov, T. C. Schulthess, and J. Dongarra, Efficient Implementation Of Quantum Materials Simulations On Distributed CPU-GPU Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.  (1.09 MB)
Haidar, A., P. Luszczek, S. Tomov, and J. Dongarra, Efficient Eigensolver Algorithms on Accelerator Based Architectures,” 2015 SIAM Conference on Applied Linear Algebra (SIAM LA), Atlanta, GA, SIAM, October 2015.  (6.98 MB)
Baboulin, M., D. Becker, G. Bosilca, A. Danalis, and J. Dongarra, An efficient distributed randomized solver with application to large dense linear systems,” ICL Technical Report, no. ICL-UT-12-02, July 2012.  (626.26 KB)
Baboulin, M., D. Becker, G. Bosilca, A. Danalis, and J. Dongarra, An Efficient Distributed Randomized Algorithm for Solving Large Dense Symmetric Indefinite Linear Systems,” Parallel Computing, vol. 40, issue 7, pp. 213-223, July 2014. DOI: 10.1016/j.parco.2013.12.003  (1.42 MB)
Zhao, Y., L. Wan, W. Wu, G. Bosilca, R. Vuduc, J. Ye, W. Tang, and Z. Xu, Efficient Communications in Training Large Scale Neural Networks,” ACM MultiMedia Workshop 2017, Mountain View, CA, ACM, October 2017.  (1.41 MB)
Benoit, A., Y. Robert, and S. K. Raina, Efficient checkpoint/verification patterns for silent error detection,” Innovative Computing Laboratory Technical Report, no. ICL-UT-14-03: University of Tennessee, May 2014.  (397.75 KB)
Benoit, A., S. K. Raina, and Y. Robert, Efficient Checkpoint/Verification Patterns,” International Journal on High Performance Computing Applications, July 2015. DOI: 10.1177/1094342015594531  (392.76 KB)
Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, Efficiency of General Krylov Methods on GPUs – An Experimental Study,” The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Chicago, IL, IEEE, May 2016. DOI: 10.1109/IPDPSW.2016.45  (285.28 KB)
Anzt, H., J. Dongarra, M. Kreutzer, G. Wellein, and M. Kohler, Efficiency of General Krylov Methods on GPUs – An Experimental Study,” 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 683-691, May 2016. DOI: 10.1109/IPDPSW.2016.45
You, H., K. Seymour, and J. Dongarra, An Effective Empirical Search Method for Automatic Software Tuning,” ICL Technical Report, no. ICL-UT-05-02, January 2005.  (74.66 KB)
Wolf, F., EARL - API Documentation,” ICL Technical Report, no. ICL-UT-04-03, October 2004.  (111.36 KB)
D
Donfack, S., S. Tomov, and J. Dongarra, Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,” University of Tennessee Computer Science Technical Report, no. ut-cs-13-713, July 2013.  (659.77 KB)
Donfack, S., S. Tomov, and J. Dongarra, Dynamically balanced synchronization-avoiding LU factorization with multicore and GPUs,” Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2014, May 2014.  (490.08 KB)
Song, F., A. YarKhan, and J. Dongarra, Dynamic Task Scheduling for Linear Algebra Algorithms on Distributed-Memory Multicore Systems,” International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '09), Portland, OR, November 2009.  (502.49 KB)
YarKhan, A., Dynamic Task Execution on Shared and Distributed Memory Architectures , 2012.  (3.29 MB)
Hoque, R., T. Herault, G. Bosilca, and J. Dongarra, Dynamic Task Discovery in PaRSEC- A data-flow task-based Runtime,” ScalA17, Denver, ACM, September 2017. DOI: 10.1145/3148226.3148233  (1.15 MB)
Cronk, D., G. Fagg, S. Emeny, and S. Tucker, Dynamic Process Management for Pipelined Applications,” Proceedings of DoD HPCMP UGC 2005 (to appear), Nashville, TN, IEEE, January 2005.
Anzt, H., E. Chow, D. Szyld, and J. Dongarra, Domain Overlap for Iterative Sparse Triangular Solves on GPUs,” Software for Exascale Computing - SPPEXA, vol. 113: Springer International Publishing, pp. 527–545, September 2016. DOI: 10.1007/978-3-319-40528-5_24
Yamazaki, I., S. Rajamanickam, E. G. Boman, M. Hoemmen, M. A. Heroux, and S. Tomov, Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, November 2014.
Danalis, A., H. Jagode, and J. Dongarra, Does your tool support PAPI SDEs yet? , Tahoe City, CA, 13th Scalable Tools Workshop, July 2019.  (3.09 MB)
Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, and J. Dongarra, Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols,” Proceedings of EuroMPI 2010, Stuttgart, Germany, Springer, September 2010.  (202.87 KB)
Le Fèvre, V., G. Bosilca, A. Bouteiller, T. Herault, A. Hori, Y. Robert, and J. Dongarra, Do moldable applications perform better on failure-prone HPC platforms?,” 11th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids, Turin, Italy, Springer Verlag, August 2018.  (360.72 KB)
Voemel, C., S. Tomov, and J. Dongarra, Divide & Conquer on Hybrid GPU-Accelerated Multicore Systems,” SIAM Journal on Scientific Computing (submitted), August 2010.
Voemel, C., S. Tomov, and J. Dongarra, Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems,” SIAM Journal on Scientific Computing, vol. 34(2), pp. C70-C82, April 2012.
Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.  (400.75 KB)
Yamazaki, I., A. Ida, R. Yokota, and J. Dongarra, Distributed-Memory Lattice H-Matrix Factorization,” The International Journal of High Performance Computing Applications, August 2019. DOI: 10.1177/1094342019861139  (1.14 MB)
Bosilca, G., A. Bouteiller, T. Herault, V. Le Fèvre, Y. Robert, and J. Dongarra, Distributed Termination Detection for HPC Task-Based Environments,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-14: University of Tennessee, June 2018.
Boehmann, T. B., Distributed Storage in RIB,” ICL Tech Report, no. ICL-UT-03-01, March 2003.  (213.02 KB)
Hiroyasu, T., M. Miki, M. Sano, H. Shimosaka, S. Tsutsui, and J. Dongarra, Distributed Probablistic Model-Building Genetic Algorithm,” Lecture Notes in Computer Science, vol. 2723: Springer-Verlag, Heidelberg, pp. 1015-1028, January 2003.  (288.91 KB)
Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,” University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.  (366.26 KB)
Dongarra, J., Z. Chen, G. Bosilca, and J. Langou, Disaster Survival Guide in Petascale Computing: An Algorithmic Approach,” in Petascale Computing: Algorithms and Applications (to appear): Chapman & Hall - CRC Press, 00 2007.  (260.18 KB)
Marin, G., C. McCurdy, and J. Vetter, Diagnosis and Optimization of Application Prefetching Performance,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013. DOI: 10.1145/2464996.2465014  (827.31 KB)
Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures,” The 17th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2016), IPDPS 2016, Chicago, IL, IEEE, May 2016.  (708.62 KB)
Kelleher, Jr., M., Development of the PICMSS NetSolve Service,” ICL Technical Report, no. ICL-UT-02-04, April 2002.  (328.44 KB)
Arnold, D., and J. Dongarra, Developing an Architecture to Support the Implementation and Development of Scientific Computing Applications,” to appear in Proceedings of Working Conference 8: Software Architecture for Scientific Computing Applications, Ottawa, Canada, October 2000.  (176.25 KB)
Fürlinger, K., and S. Moore, Detection and Analysis of Iterative Behavior in Parallel Applications,” Proceedings of the 2008 International Conference on Computational Science (ICCS 2008), vol. 5103, Krakow, Poland, pp. 261-267, January 2008.  (141.02 KB)
Kurzak, J., P. Wu, M. Gates, I. Yamazaki, P. Luszczek, G. Ragghianti, and J. Dongarra, Designing SLATE: Software for Linear Algebra Targeting Exascale,” SLATE Working Notes, no. 3, ICL-UT-17-06: Innovative Computing Laboratory, University of Tennessee, October 2017.  (2.8 MB)
Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, Designing LU-QR Hybrid Solvers for Performance and Stability,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014. DOI: 10.1109/IPDPS.2014.108  (4.2 MB)
Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, Designing LU-QR hybrid solvers for performance and stability,” University of Tennessee Computer Science Technical Report (also LAWN 282), no. ut-eecs-13-719: University of Tennessee, October 2013.  (4.11 MB)
Haidar, A., A. Abdelfattah, M. Zounon, P. Wu, S. Pranesh, S. Tomov, and J. Dongarra, The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques,” International Conference on Computational Science (ICCS 2018), vol. 10860, Wuxi, China, Springer, pp. 586–600, June 2018. DOI: 10.1007/978-3-319-93698-7_45
Luszczek, P., and J. Dongarra, Design of an Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations,” International Conference on Computational Science, Poland, Springer Verlag, June 2004. DOI: 10.1007/978-3-540-25944-2_35  (88.31 KB)
You, H., Q. Liu, Z. Li, and S. Moore, The Design of an Auto-tuning I/O Framework on Cray XT5 System,” Cray Users Group Conference (CUG'11) (Best Paper Finalist), Fairbanks, Alaska, May 2011.  (459.57 KB)
Cao, C., G. Bosilca, T. Herault, and J. Dongarra, Design for a Soft Error Resilient Dynamic Task-based Runtime,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.  (2.31 MB)
Cao, C., T. Herault, G. Bosilca, and J. Dongarra, Design for a Soft Error Resilient Dynamic Task-based Runtime,” ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014.  (2.61 MB)
Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors,” ISC High Performance 2015, Frankfurt, Germany, July 2015.  (1.49 MB)
Guidry, M., and A. Haidar, On the Design, Autotuning, and Optimization of GPU Kernels for Kinetic Network Simulations Using Fast Explicit Integration and GPU Batched Computation , Oak Ridge, TN, Joint Institute for Computational Sciences Seminar Series, Presentation, September 2015.  (17.25 MB)
Dongarra, J., S. Hammarling, N. J. Higham, S. Relton, P. Valero-Lara, and M. Zounon, The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems,” International Conference on Computational Science (ICCS 2017), Zürich, Switzerland, Elsevier, June 2017. DOI: DOI:10.1016/j.procs.2017.05.138
Kurzak, J., P. Luszczek, I. Yamazaki, Y. Robert, and J. Dongarra, Design and Implementation of the PULSAR Programming System for Large Scale Computing,” Supercomputing Frontiers and Innovations, vol. 4, issue 1, 2017. DOI: 10.14529/jsfi170101
D'Azevedo, E., and J. Dongarra, The Design and Implementation of the Parallel Out of Core ScaLAPACK LU, QR, and Cholesky Factorization Routines,” Concurrency: Practice and Experience, vol. 12, no. 15, pp. 1481-1493, January 2000.  (374.18 KB)
Raman, G., and J. Dongarra, Design and Implementation of NetSolve using DCOM as the Remoting Layer,” University of Tennessee Computer Science Department Technical Report, no. UT-CS-00-440, May 2000.  (65.45 KB)

Pages