Publications

Export 12 results:
Filters: Keyword is resilience  [Clear All Filters]
2019
Han, L., V. Le Fèvre, L-C. Canon, Y. Robert, and F. Vivien, A Generic Approach to Scheduling and Checkpointing Workflows,” International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1255-1274, November 2019. DOI: 10.1177/1094342019866891  (555.01 KB)
Losada, N., G. Bosilca, A. Bouteiller, P. González, and M. J. Martín, Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging,” Future Generation Computer Systems, vol. 91, pp. 450-464, February 2019. DOI: 10.1016/j.future.2018.09.041  (1.16 MB)
2018
Han, L., L-C. Canon, H. Casanova, Y. Robert, and F. Vivien, Checkpointing Workflows for Fail-Stop Errors,” IEEE Transactions on Computers, vol. 67, issue 8, pp. 1105–1120, August 2018.
2017
Benoit, A., L. Pottier, and Y. Robert, Resilient Co-Scheduling of Malleable Applications,” International Journal of High Performance Computing Applications (IJHPCA), May 2017. DOI: 10.1177/1094342017704979  (1.62 MB)
2016
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Assessing General-purpose Algorithms to Cope with Fail-stop and Silent Errors,” ACM Transactions on Parallel Computing, August 2016. DOI: 10.1145/2897189  (573.71 KB)
Benoit, A., A. Cavelan, Y. Robert, and H. Sun, Optimal Resilience Patterns to Cope with Fail-stop and Silent Errors,” 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016. DOI: 10.1109/IPDPS.2016.39  (603.58 KB)
2015
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing,” International Journal of Networking and Computing, vol. 5, no. 1, pp. 2-15, January 2015.  (755.54 KB)
2014
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the Impact of ABFT and Checkpoint Composite Strategies,” 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.02 MB)
Dongarra, J., T. Herault, and Y. Robert, Performance and Reliability Trade-offs for the Double Checkpointing Algorithm,” International Journal of Networking and Computing, vol. 4, no. 1, pp. 32-41.  (859.04 KB)
2013
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the impact of ABFT and Checkpoint composite strategies,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.  (968.47 KB)
Dongarra, J., T. Herault, and Y. Robert, Revisiting the Double Checkpointing Algorithm,” University of Tennessee Computer Science Technical Report (LAWN 274), no. ut-cs-13-705, January 2013.  (682.22 KB)