Publications

Export 5 results:
Filters: Keyword is fault-tolerance and Author is George Bosilca  [Clear All Filters]
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
A
Bouteiller, A., T. Herault, G. Bosilca, P. Du, and J. Dongarra, Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy,” ACM Transactions on Parallel Computing, vol. 1, issue 2, no. 10, pp. 10:1-10:28, January 2015.  (1.14 MB)
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the impact of ABFT and Checkpoint composite strategies,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.  (968.47 KB)
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the Impact of ABFT and Checkpoint Composite Strategies,” 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.02 MB)
C
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing,” International Journal of Networking and Computing, vol. 5, no. 1, pp. 2-15, January 2015.  (755.54 KB)
F
Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, Failure Detection and Propagation in HPC Systems,” Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.