Publications

Export 4 results:
Filters: Keyword is Fault tolerance and Author is George Bosilca  [Clear All Filters]
Conference Paper
Losada, N., A. Bouteiller, and G. Bosilca, Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications,” Fault Tolerance for HPC at eXtreme Scale (FTXS) Workshop at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'19), November 2019.  (440.7 KB)
Journal Article
Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, An evaluation of User-Level Failure Mitigation support in MPI,” Computing, vol. 95, issue 12, pp. 1171-1184, December 2013. DOI: 10.1007/s00607-013-0331-3  (311.23 KB)
Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, A Failure Detector for HPC Platforms,” The International Journal of High Performance Computing Applications, vol. 32, issue 1, pp. 139–158, January 2018. DOI: 10.1177/1094342017711505  (1.04 MB)
Hori, A., K. Yoshinaga, T. Herault, A. Bouteiller, G. Bosilca, and Y. Ishikawa, Overhead of Using Spare Nodes,” The International Journal of High Performance Computing Applications, February 2020. DOI: 10.1177%2F1094342020901885  (2.15 MB)