Publications

Export 1274 results:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
W
Wolf, F., B. Mohr, J. Dongarra, and S. Moore, Automatic analysis of inefficiency patterns in parallel applications,” Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted), 00 2005.  (233.31 KB)
Wolf, F., B. Wylie, E. Abraham, W. Frings, K. Fürlinger, M. Geimer, M-A. Hermanns, B. Mohr, S. Moore, and M. Pfeifer, Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications,” Proceedings of the 2nd International Workshop on Tools for High Performance Computing, Stuttgart, Germany, Springer, pp. 157-167, January 2008.  (229.2 KB)
Wolf, F., and B. Mohr, Hardware-Counter Based Automatic Performance Analysis of Parallel Programs,” Advances in Parallel Computing, vol. 13, Dresden, Germany, Elsevier, pp. 753-760, January 2004, 2003.
Wolf, F., EARL - API Documentation,” ICL Technical Report, no. ICL-UT-04-03, October 2004.  (111.36 KB)
Wolf, F., B. Mohr, J. Dongarra, and S. Moore, Automatic Analysis of Inefficiency Patterns in Parallel Applications,” Concurrency and Computation: Practice and Experience, vol. 19, no. 11, pp. 1481-1496, August 2007.  (233.31 KB)
Wolf, F., A. D. Malony, S. Shende, and A. Morris, Trace-Based Parallel Performance Overhead Compensation,” In Proc. of the International Conference on High Performance Computing and Communications (HPCC), Sorrento (Naples), Italy, September 2005.  (306.88 KB)
Wong, K., S. Tomov, and J. Dongarra, Hands-on Research and Training in High-Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.  (1016.52 KB)
Wong, K., S. Tomov, D. Nichols, R. Febbo, F. Lopez, J. Halloy, and X. Ma, How to Build Your Own Deep Neural Network : PEARC20, July 2020.  (18.8 MB)
Wong, K., S. Tomov, and J. Dongarra, Project-Based Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning,” The Journal of Computational Science Education, vol. 11, issue 1, pp. 36-44, January 2020.  (4.4 MB)
Worley, P. H., J. Candy, L. Carrington, K. Huck, T. Kaiser, K. Mahinthakumar, A. D. Malony, S. Moore, D. Reed, P. C. Roth, et al., Performance Analysis of GYRO: A Tool Evaluation,” In Proceedings of the 2005 SciDAC Conference, San Francisco, CA, June 2005.  (172.07 KB)
Wu, W., A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, Hierarchical DAG scheduling for Hybrid Distributed Systems,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.  (1.11 MB)
Wu, W., G. Bosilca, R. vandeVaart, S. Jeaugey, and J. Dongarra, GPU-Aware Non-contiguous Data Movement In Open MPI,” 25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, ACM, June 2016.  (482.32 KB)
Tsai, Y-H. Mike, N. Beams, and H. Anzt, Mixed Precision Algebraic Multigrid on GPUs,” Parallel Processing and Applied Mathematics (PPAM 2022), vol. 13826, Cham, Springer International Publishing, April 2023.
Parallel Processing and Applied Mathematics, 9th International Conference, PPAM 2011,” Lecture Notes in Computer Science, vol. 7203, Torun, Poland, 00 2012.
8th International Conference on Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science (LNCS),” PPAM 2009 Proceedings, vol. 6067, Wroclaw, Poland, Springer, September 2010.
,” 7th International parallel Processing and Applied Mathematics Conference, Lecture Notes in Comptuer Science, vol. 4967, Gdansk, Poland, Springer Berlin, January 2008.
Wyrzykowski, R., E. Deelman, J. Dongarra, and K. Karczewski, Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part II,” Lecture Notes in Computer Science, no. 12044: Springer International Publishing, pp. 503, March 2020.
Wyrzykowski, R., E. Deelman, J. Dongarra, and K. Karczewski, Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I,” Lecture Notes in Computer Science, 1, no. 12043: Springer International Publishing, pp. 581, March 2020.
Y
Kurzak, J., M. Gates, A. Charara, A. YarKhan, I. Yamazaki, and J. Dongarra, Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators,” Euro-Par 2019: Parallel Processing, vol. 11725: Springer, pp. 495–506, August 2019.
Yamazaki, I., T. Mary, J. Kurzak, S. Tomov, and J. Dongarra, Access-averse Framework for Computing Low-rank Matrix Approximations,” First International Workshop on High Performance Big Graph Data Management, Analysis, and Mining, Washington, DC, October 2014.
Yamazaki, I., M. Hoemmen, P. Luszczek, and J. Dongarra, Comparing performance of s-step and pipelined GMRES on distributed-memory multicore CPUs , Pittsburgh, Pennsylvania, SIAM Annual Meeting, July 2017.  (748 KB)
Yamazaki, I., J. Barlow, S. Tomov, J. Kurzak, and J. Dongarra, Mixed-precision orthogonalization process Performance on multicore CPUs with GPUs,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.  (301.01 KB)
Yamazaki, I., M. Hoemmen, P. Luszczek, and J. Dongarra, Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives,” Proceedings of The 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017), Best Paper Award, Orlando, FL, June 2017.  (453.66 KB)
Yamazaki, I., S. Rajamanickam, E. G. Boman, M. Hoemmen, M. A. Heroux, and S. Tomov, Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 14), New Orleans, LA, IEEE, November 2014.
Yamazaki, I., S. Nooshabadi, S. Tomov, and J. Dongarra, Structure-aware Linear Solver for Realtime Convex Optimization for Embedded Systems,” IEEE Embedded Systems Letters, vol. 9, issue 3, pp. 61–64, May 2017.  (339.11 KB)
Yamazaki, I., E. Chow, A. Bouteiller, and J. Dongarra, Performance of Asynchronous Optimized Schwarz with One-sided Communication,” Parallel Computing, vol. 86, pp. 66-81, August 2019.  (3.09 MB)
Yamazaki, I., S. Tomov, J. Kurzak, J. Dongarra, and J. Barlow, Mixed-precision Block Gram Schmidt Orthogonalization,” 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Austin, TX, ACM, November 2015.  (235.69 KB)
Yamazaki, I., S. Tomov, and J. Dongarra, Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPU with Multiple GPUs,” SIAM Journal on Scientific Computing, vol. 37, no. 3, pp. C203-C330, May 2015.  (374.8 KB)
Yamazaki, I., S. Tomov, and J. Dongarra, Sampling Algorithms to Update Truncated SVD,” IEEE International Conference on Big Data, Boston, MA, IEEE, December 2017.  (700.79 KB)
Yamazaki, I., A. Ida, R. Yokota, and J. Dongarra, Distributed-Memory Lattice H-Matrix Factorization,” The International Journal of High Performance Computing Applications, vol. 33, issue 5, pp. 1046–1063, August 2019.  (1.14 MB)
Yamazaki, I., H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra, Improving the performance of CA-GMRES on multicores with multiple GPUs,” IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (333.82 KB)
Yamazaki, I., S. Nooshabadi, S. Tomov, and J. Dongarra, High Performance Realtime Convex Solver for Embedded Systems,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-745, October 2016.  (225.43 KB)
Yamazaki, I., S. Tomov, and J. Dongarra, Computing Low-rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and its Application to Solving a Hierarchically Semiseparable Linear System of Equations,” Scientific Programming, 2015.  (648.87 KB)
Yamazaki, I., and J. Dongarra, LAWN 294: Aasen's Symmetric Inde nite Linear Solvers in LAPACK,” LAPACK Working Note, no. LAWN 294, ICL-UT-17-13: University of Tennessee, December 2017.  (854.1 KB)
Yamazaki, I., S. Tomov, T. Dong, and J. Dongarra, Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs,” VECPAR 2014 (Best Paper), Eugene, OR, June 2014.  (438.54 KB)
Yamazaki, I., A. Abdelfattah, A. Ida, S. Ohshima, S. Tomov, R. Yokota, and J. Dongarra, Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU Clusters,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada, IEEE, May 2018.  (1.37 MB)
Yamazaki, I., S. Tomov, and J. Dongarra, One-Sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators,” The International Conference on Computational Science (ICCS), June 2012.
Yamazaki, I., T. Dong, S. Tomov, and J. Dongarra, Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster,” The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 2013.
Yamazaki, I., S. Tomov, and J. Dongarra, Non-GPU-resident Dense Symmetric Indefinite Factorization,” Concurrency and Computation: Practice and Experience, November 2016.
Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, Randomized Algorithms to Update Partial Singular Value Decomposition on a Hybrid CPU/GPU Cluster,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.
Yamazaki, I., S. Tomov, and J. Dongarra, Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, New Orleans, LA, November 2014.  (465.52 KB)
Yamazaki, I., T. Dong, R. Solcà, S. Tomov, J. Dongarra, and T. C. Schulthess, Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems,” Concurrency and Computation: Practice and Experience, October 2013.  (1.71 MB)
Yamazaki, I., D. Becker, J. Dongarra, A. Druinsky, I.. Peled, S. Toledo, G. Ballard, J. Demmel, and O. Schwartz, Implementing a Blocked Aasen’s Algorithm with a Dynamic Scheduler on Multicore Architectures,” IPDPS 2013 (submitted), Boston, MA, 00 2013.  (1.22 MB)
Yamazaki, I., J. Kurzak, P. Luszczek, and J. Dongarra, Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime,” Workshop on Large-Scale Parallel Processing, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (398.16 KB)
Yamazaki, I., S. Tomov, and J. Dongarra, Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU,” ACM Transactions on Mathematical Software (TOMS), vol. 43, issue 2, October 2016.
Yamazaki, I., J. Kurzak, P. Wu, M. Zounon, and J. Dongarra, Symmetric Indefinite Linear Solver using OpenMP Task on Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 8, pp. 1879–1892, August 2018.  (2.88 MB)
YarKhan, A., J. Dongarra, and K. Seymour, GridSolve: The Evolution of Network Enabled Solver,” Grid-Based Problem Solving Environments: IFIP TC2/WG 2.5 Working Conference on Grid-Based Problem Solving Environments (Prescott, AZ, July 2006): Springer, pp. 215-226, 00 2007.  (377.48 KB)
YarKhan, A., J. Kurzak, A. Abdelfattah, and J. Dongarra, An Empirical View of SLATE Algorithms on Scalable Hybrid System,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-08: University of Tennessee, Knoxville, September 2019.  (441.16 KB)
YarKhan, A., G. Ragghianti, J. Dongarra, M. Cawkwell, D. Perez, and A. Voter, Initial Integration and Evaluation of SLATE Parallel BLAS in LATTE,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-07: Innovative Computing Laboratory, University of Tennessee, June 2018.  (366.6 KB)
YarKhan, A., and J. Dongarra, Experiments with Scheduling Using Simulated Annealing in a Grid Environment,” Grid Computing - GRID 2002, Third International Workshop, vol. 2536, Baltimore, MD, Springer, pp. 232-242, November 2002.  (66.91 KB)

Pages