Publications

Benoit, A., V. Le Fèvre, P. Raghavan, Y. Robert, and H. Sun, “Resilient scheduling heuristics for rigid parallel jobs,” Int. J. of Networking and Computing, vol. 11, no. 1, pp. 2-26, 2021.

(8.67 MB)

Benoit, A., T. Herault, L. Perotin, Y. Robert, and F. Vivien, “Revisiting I/O bandwidth-sharing strategies for HPC applications,” INRIA Research Report, no. RR-9502: INRIA, March 2023.

Benoit, A., A. Cavelan, V. Le Fèvre, and Y. Robert, “Optimal Checkpointing Period with replicated execution on heterogeneous platforms,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, IEEE Computer Society Press, June 2017.

(1.02 MB)

Benoit, A., S. K. Raina, and Y. Robert, “Efficient Checkpoint/Verification Patterns,” International Journal on High Performance Computing Applications, July 2015.

(392.76 KB)

Benoit, A., V. Le Fèvre, P. Raghavan, Y. Robert, and H. Sun, “Design and Comparison of Resilient Scheduling Heuristics for Parallel Jobs,” 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020), New Orleans, LA, IEEE Computer Society Press, May 2020.

(696.21 KB)

Benoit, A., A. Cavelan, F. Cappello, P. Raghavan, Y. Robert, and H. Sun, “Coping with Silent and Fail-Stop Errors at Scale by Combining Replication and Checkpointing,” Journal of Parallel and Distributed Computing, vol. 122, pp. 209–225, December 2018.

(837 KB)

Benoit, A., R. Elghazi, and Y. Robert, “Max-Stretch Minimization on an Edge-Cloud Platform,” IPDPS'2021, the 34th IEEE International Parallel and Distributed Processing Symposium: IEEE Computer Society Press, 2021.

(4.94 MB)

Benoit, A., A. Cavelan, Y. Robert, and H. Sun, “Optimal Resilience Patterns to Cope with Fail-stop and Silent Errors,” 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.

(603.58 KB)

Benoit, A., Y. Robert, and S. K. Raina, “Efficient checkpoint/verification patterns for silent error detection,” Innovative Computing Laboratory Technical Report, no. ICL-UT-14-03: University of Tennessee, May 2014.

(397.75 KB)

Benoit, A., A. Cavelan, F. M. Ciorba, V. Le Fèvre, and Y. Robert, “Combining Checkpointing and Replication for Reliable Execution of Linear Workflows with Fail-Stop and Silent Errors,” International Journal of Networking and Computing, vol. 9, no. 1, pp. 2-27.

(754.6 KB)

Berman, F., H. Casanova, A. Chien, K. Cooper, H. Dail, A. Dasgupta, W. Deng, J. Dongarra, L. Johnsson, K. Kennedy, et al., “New Grid Scheduling and Rescheduling Methods in the GrADS Project,” International Journal of Parallel Programming, vol. 33, no. 2: Springer, pp. 209-229, June 2005.

(306.41 KB)

Berman, F., A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. Mellor-Crummey, et al., “The GrADS Project: Software Support for High-Level Grid Application Development,” International Journal of High Performance Applications and Supercomputing, vol. 15, no. 4, pp. 327-344, January 2001.

(271.52 KB)

Berman, F., A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, D. Reed, et al., “The GrADS Project: Software Support for High-Level Grid Application Development,” Technical Report, February 2000.

(347.41 KB)

Bernholc, J., M. Hodak, W. Lu, S. Moore, and S. Tomov, “Scalability Study of a Quantum Simulation Code,” PARA 2010, Reykjavik, Iceland, June 2010.

Bernholdt, D. E., S. Boehm, G. Bosilca, M G. Venkata, R. E. Grant, T. Naughton, H. P. Pritchard, M. Schulz, and G. R. Vallee, “A Survey of MPI Usage in the US Exascale Computing Project,” Concurrency Computation: Practice and Experience, September 2018.

(359.54 KB)

Berry, M., and J. Dongarra, “Atlanta Organizers Put Mathematics to Work For the Math Sciences Community,” SIAM News, vol. 32, no. 6, January 1999.

(45.98 KB)

Betancourt, F., K. Wong, E. Asemota, Q. Marshall, D. Nichols, and S. Tomov, “OpenDIEL: A Parallel Workflow Engine and DataAnalytics Framework,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.

(1.48 MB)

Bhatia, N., S. Moore, F. Wolf, J. Dongarra, and B. Mohr, “A Pattern-Based Approach to Automated Application Performance Analysis,” Workshop on Patterns in High Performance Computing, University of Illinois at Urbana-Champaign, May 2005.

(3.47 MB)

Bhatia, N., F. Song, F. Wolf, J. Dongarra, B. Mohr, and S. Moore, “Automatic Experimental Analysis of Communication Patterns in Virtual Topologies,” In Proceedings of the International Conference on Parallel Processing, Oslo, Norway, IEEE Computer Society, June 2005.

(227.13 KB)

Bhowmick, S., V. Eijkhout, Y. Freund, E. Fuentes, and D. Keyes, “Application of Machine Learning to the Selection of Sparse Linear Solvers,” International Journal of High Performance Computing Applications (submitted), 00 2006.

(392.96 KB)

Blackford, S., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, et al., “Basic Linear Algebra Subprograms (BLAS),” (an update), submitted to ACM TOMS, February 2001.

(228.33 KB)

Blackford, S., J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, et al., “An Updated Set of Basic Linear Algebra Subprograms (BLAS),” ACM Transactions on Mathematical Software, vol. 28, no. 2, pp. 135-151, December 2002.

(228.33 KB)

Bland, W., A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Post-failure recovery of MPI communication capability: Design and rationale,” International Journal of High Performance Computing Applications, vol. 27, issue 3, pp. 244 - 254, January 2013.

(285.77 KB)

Bland, W., “Enabling Application Resilience With and Without the MPI Standard,” 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Ottawa, Canada, May 2012.

(262.93 KB)

Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-702, 00 2012.

(422.76 KB)

Bland, W., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, “A Proposal for User-Level Failure Mitigation in the MPI-3 Standard,” University of Tennessee Electrical Engineering and Computer Science Technical Report, no. ut-cs-12-693: University of Tennessee, February 2012.

(159.46 KB)

Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, “An evaluation of User-Level Failure Mitigation support in MPI,” Computing, vol. 95, issue 12, pp. 1171-1184, December 2013.

(311.23 KB)

Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI,” Concurrency and Computation: Practice and Experience, July 2013.

(3.89 MB)

Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, “An Evaluation of User-Level Failure Mitigation Support in MPI,” Proceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, Springer, September 2012.

Bland, W., “User Level Failure Mitigation in MPI,” Euro-Par 2012: Parallel Processing Workshops, vol. 7640, Rhodes Island, Greece, Springer Berlin Heidelberg, pp. 499-504, August 2012.

(136.15 KB)

Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, “A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI,” 18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award), Rhodes, Greece, Springer-Verlag, August 2012.

(289.32 KB)

Boehmann, T. B., “Distributed Storage in RIB,” ICL Tech Report, no. ICL-UT-03-01, March 2003.

(213.02 KB)

Boillot, L., G. Bosilca, E. Agullo, and H. Calandra, “Task-Based Programming for Seismic Imaging: Preliminary Results,” 2014 IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.

(625.86 KB)

Bosilca, G., T. Herault, P. Lemariner, J. Dongarra, and A.. Rezmerita, “Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure,” Proceedings of Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, vol. 6960, Santorini, Greece, Springer, pp. 342-344, September 2011.

(115.75 KB)

Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, “Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing,” International Journal of Networking and Computing, vol. 5, no. 1, pp. 2-15, January 2015.

(755.54 KB)

Bosilca, G., Z. Chen, J. Dongarra, and J. Langou, “Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,” ICL Technical Report, no. ICL-UT-04-04, January 2004.

(241.36 KB)

Bosilca, G., Z. Chen, J. Dongarra, and J. Langou, “Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,” University of Tennessee Computer Science Department Technical Report, UT-CS-04-538, 00 2005.

(241.36 KB)

Bosilca, G., T. Herault, A.. Rezmerita, and J. Dongarra, “On Scalability for MPI Runtime Systems,” University of Tennessee Computer Science Technical Report, no. ICL-UT-11-05, Knoxville, TN, May 2011.

(898.76 KB)

Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., “Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-02, 00 2010.

(400.75 KB)

Bosilca, G., J. Dongarra, and H. Ltaeif, “Power Profiling of Cholesky and QR Factorizations on Distributed Memory Systems,” Third International Conference on Energy-Aware High Performance Computing, Hamburg, Germany, September 2012.

(290.27 KB)

Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., “Distributed Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA,” University of Tennessee Computer Science Technical Report, UT-CS-10-660, September 2010.

(366.26 KB)

Bosilca, G., A. Bouteiller, T. Herault, V. Le Fèvre, Y. Robert, and J. Dongarra, “Distributed Termination Detection for HPC Task-Based Environments,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-14: University of Tennessee, June 2018.

Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, and J. Dongarra, “Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols,” Proceedings of EuroMPI 2010, Stuttgart, Germany, Springer, September 2010.

(202.87 KB)

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, “DAGuE: A generic distributed DAG engine for high performance computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-10-01, April 2010.

(830.85 KB)

Bosilca, G., J. Dongarra, G. Fagg, and J. Langou, “Hash Functions for Datatype Signatures in MPI,” Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, vol. 3666, Sorrento (Naples), Italy, Springer-Verlag Berlin, pp. 76-83, September 2005.

(304.2 KB)

Bosilca, G., T. Herault, and J. Dongarra, DTE: PaRSEC Systems and Interfaces (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.

(840.54 KB)

Bosilca, G., T. Herault, and J. Dongarra, DTE: PaRSEC Enabled Libraries and Applications : 2021 Exascale Computing Project Annual Meeting, April 2021.

(3.24 MB)

Bosilca, G., C. Coti, T. Herault, P. Lemariner, and J. Dongarra, “Constructing resiliant communication infrastructure for runtime environments,” Innovative Computing Laboratory Technical Report, no. ICL-UT-09-02, July 2009.

(463.71 KB)