Publications

Export 6 results:
Filters: Author is Yu Pei  [Clear All Filters]
Conference Paper
Gates, M., J. Kurzak, P. Luszczek, Y. Pei, and J. Dongarra, Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices,” Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL, IEEE, June 2017. DOI: 10.1109/IPDPSW.2017.18
Pei, Y., Q. Cao, G. Bosilca, P. Luszczek, V. Eijkhout, and J. Dongarra, Communication Avoiding 2D Stencil Implementations over PaRSEC Task-Based Runtime,” 21st IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2020), New Orleans, LA, IEEE, May 2020.  (1.33 MB)
Pei, Y., G. Bosilca, I. Yamazaki, A. Ida, and J. Dongarra, Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization,” PAW-ATM Workshop at SC19, Denver, CO, ACM, November 2019.  (4.51 MB)
Cao, Q., Y. Pei, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications,” Platform for Advanced Scientific Computing Conference (PASC20), Geneva, Switzerland, ACM, June 2020. DOI: 10.1145/3394277.3401846  (2.71 MB)
Luo, X., W. Wu, G. Bosilca, Y. Pei, Q. Cao, T. Patinyasakdikul, D. Zhong, and J. Dongarra, HAN: A Hierarchical AutotuNed Collective Communication Framework,” IEEE Cluster Conference, Kobe, Japan, IEEE Computer Society Press, September 2020.  (764.05 KB)
Cao, Q., Y. Pei, T. Herault, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools,” Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19, Denver, CO, ACM, November 2019.  (429.55 KB)