%0 Conference Proceedings %B Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16) %D 2016 %T Failure Detection and Propagation in HPC Systems %A George Bosilca %A Aurelien Bouteiller %A Amina Guermouche %A Thomas Herault %A Yves Robert %A Pierre Sens %A Jack Dongarra %K failure detection %K fault-tolerance %K MPI %B Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16) %I IEEE Press %C Salt Lake City, Utah %P 27:1-27:11 %8 2016-11 %@ 978-1-4673-8815-3 %G eng %U http://dl.acm.org/citation.cfm?id=3014904.3014941