Publications

Export 8 results:
Filters: Author is Wei Wu  [Clear All Filters]
2015
Wu, W., A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, Hierarchical DAG scheduling for Hybrid Distributed Systems,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.  (1.11 MB)
2016
Wu, W., G. Bosilca, R. vandeVaart, S. Jeaugey, and J. Dongarra, GPU-Aware Non-contiguous Data Movement In Open MPI,” 25th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, ACM, June 2016.  (482.32 KB)
2017
Zhao, Y., L. Wan, W. Wu, G. Bosilca, R. Vuduc, J. Ye, W. Tang, and Z. Xu, Efficient Communications in Training Large Scale Neural Networks,” ACM MultiMedia Workshop 2017, Mountain View, CA, ACM, October 2017.  (1.41 MB)
2018
Luo, X., W. Wu, G. Bosilca, T. Patinyasakdikul, L. Wang, and J. Dongarra, ADAPT: An Event-Based Adaptive Collective Communication Framework,” The 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18), Tempe, Arizona, ACM Press, June 2018.  (493.65 KB)
2020
Wang, L., W. Wu, J. Zhang, H. Liu, G. Bosilca, M. Herlihy, and R. Fonseca, FFT-Based Gradient Sparsification for the Distributed Training of Deep Neural Networks,” 9th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 20), Stockholm, Sweden, ACM, June 2020.  (4.72 MB)
Cao, Q., G. Bosilca, W. Wu, D. Zhong, A. Bouteiller, and J. Dongarra, Flexible Data Redistribution in a Task-Based Runtime System,” IEEE International Conference on Cluster Computing (Cluster 2020), Kobe, Japan, IEEE, September 2020.  (354.8 KB)
Luo, X., W. Wu, G. Bosilca, Y. Pei, Q. Cao, T. Patinyasakdikul, D. Zhong, and J. Dongarra, HAN: A Hierarchical AutotuNed Collective Communication Framework,” IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020.  (764.05 KB)
Slaughter, E., W. Wu, Y. Fu, L. Brandenburg, N. Garcia, W. Kautz, E. Marx, K. S. Morris, Q. Cao, G. Bosilca, et al., Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance,” International Conference for High Performance Computing Networking, Storage, and Analysis (SC20): ACM, November 2020.  (644.92 KB)