Publications

Export 4 results:
Filters: Keyword is Fault tolerance and Author is Aurelien Bouteiller  [Clear All Filters]
Journal Article
Hori, A., K. Yoshinaga, T. Herault, A. Bouteiller, G. Bosilca, and Y. Ishikawa, Overhead of Using Spare Nodes,” The International Journal of High Performance Computing Applications, February 2020. DOI: 10.1177%2F1094342020901885  (2.15 MB)
Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, A Failure Detector for HPC Platforms,” The International Journal of High Performance Computing Applications, vol. 32, issue 1, pp. 139–158, January 2018. DOI: 10.1177/1094342017711505  (1.04 MB)
Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, An evaluation of User-Level Failure Mitigation support in MPI,” Computing, vol. 95, issue 12, pp. 1171-1184, December 2013. DOI: 10.1007/s00607-013-0331-3  (311.23 KB)
Conference Paper
Losada, N., A. Bouteiller, and G. Bosilca, Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications,” Fault Tolerance for HPC at eXtreme Scale (FTXS) Workshop at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'19), November 2019.  (440.7 KB)