Publications
Export 7 results:
Filters: Keyword is fault-tolerance and Author is Yves Robert [Clear All Filters]
Computing the Expected Makespan of Task Graphs in the Presence of Silent Errors,”
Parallel Computing, vol. 75, pp. 41–60, July 2018.
(2.56 MB)
“Failure Detection and Propagation in HPC Systems,”
Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.
“Scheduling Computational Workflows on Failure-prone Platforms,”
International Journal of Networking and Computing, vol. 6, no. 1, pp. 2-26, 2016.
(503.81 KB)
“Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing,”
International Journal of Networking and Computing, vol. 5, no. 1, pp. 2-15, January 2015.
(755.54 KB)
“Assessing the Impact of ABFT and Checkpoint Composite Strategies,”
16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(1.02 MB)
“Assessing the impact of ABFT and Checkpoint composite strategies,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.
(968.47 KB)
“Revisiting the Double Checkpointing Algorithm,”
University of Tennessee Computer Science Technical Report (LAWN 274), no. ut-cs-13-705, January 2013.
(682.22 KB)
“