Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance,” International Conference for High Performance Computing Networking, Storage, and Analysis (SC20): ACM, November 2020.“
The Template Task Graph (TTG) - An Emerging Practical Dataflow Programming Paradigm for Scientific Simulation at Extreme Scale,” 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2): IEEE, November 2020. DOI: 10.1109/ESPM251964.2020.00011“
Using Advanced Vector Extensions AVX-512 for MPI Reduction,” EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Austin, TX, September 2020. DOI: 10.1145/3416315.3416316“
Using Advanced Vector Extensions AVX-512 for MPI Reduction (Poster) , Austin, TX, EuroMPI/USA '20: 27th European MPI Users' Group Meeting, September 2020.
Using Arm Scalable Vector Extension to Optimize Open MPI,” 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), Melbourne, Australia, IEEE/ACM, May 2020. DOI: 10.1109/CCGrid49817.2020.00-71“
Accelerating FFT towards Exascale Computing : NVIDIA GPU Technology Conference (GTC2021), 2021.
Callback-based completion notification using MPI Continuations,” Parallel Computing, vol. 21238566, issue 0225, pp. 102793, May Jan. DOI: 10.1016/j.parco.2021.102793“
Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure,” 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021.“
DTE: PaRSEC Enabled Libraries and Applications : 2021 Exascale Computing Project Annual Meeting, April 2021.
An international survey on MPI users,” Parallel Computing, vol. 108, December 2021. DOI: 10.1016/j.parco.2021.102853“
An Introduction to High Performance Computing and Its Intersection with Advances in Modeling Rare Earth Elements and Actinides,” Rare Earth Elements and Actinides: Progress in Computational Science Applications, vol. 1388, Washington, DC, American Chemical Society, pp. 3-53, October 2021. DOI: 10.1021/bk-2021-1388.ch001“
Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems,” 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021), Portland, OR, IEEE, May 2021.“
Quo Vadis MPI RMA? Towards a More Efficient Use of MPI One-Sided Communication,” EuroMPI'21, Garching, Munich Germany, 2021.“
Revisiting Credit Distribution Algorithms for Distributed Termination Detection,” 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW): IEEE, pp. 611–620, 2021. DOI: 10.1109/IPDPSW52791.2021.00095“
Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, issue 4, pp. 964 - 976, April 2022. DOI: 10.1109/TPDS.2021.3084071“
Evaluating Data Redistribution in PaRSEC,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 8, pp. 1856-1872, 2022. DOI: 10.1109/TPDS.2021.3131657“
A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2022.“
Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,” ICL Technical Report, no. ICL-UT-22-04, May 2022.“