Publications

Export 15 results:
Filters: Author is Franck Cappello  [Clear All Filters]
Conference Paper
Nicolae, B., J. Li, J. M. Wozniak, G. Bosilca, M. Dorier, and F. Cappello, DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models,” 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, IEEE, May 2020.  (424.19 KB)
Benoit, A., F. Cappello, A. Cavelan, Y. Robert, and H. Sun, Identifying the Right Replication Level to Detect and Correct Silent Errors at Scale,” 2017 Workshop on Fault-Tolerance for HPC at Extreme Scale, Washington, DC, ACM, June 2017.  (865.68 KB)
Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization,” Euro-Par 2013, Aachen, Germany, Springer, August 2013.  (431.84 KB)
Tseng, S-M., B. Nicolae, G. Bosilca, E. Jeannot, A. Chandramowlishwaran, and F. Cappello, Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring,” 2019 European Conference on Parallel Processing (Euro-Par 2019), Göttingen, Germany, Springer, August 2019.  (1.07 MB)
Journal Article
Asch, M., T. Moore, R. M. Badia, M. Beck, P. Beckman, T. Bidot, F. Bodin, F. Cappello, A. Choudhary, B. R. de Supinski, et al., Big Data and Extreme-Scale Computing: Pathways to Convergence - Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,” The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 435–479, July 2018.  (1.29 MB)
Benoit, A., A. Cavelan, F. Cappello, P. Raghavan, Y. Robert, and H. Sun, Coping with Silent and Fail-Stop Errors at Scale by Combining Replication and Checkpointing,” Journal of Parallel and Distributed Computing, vol. 122, pp. 209–225, December 2018.  (837 KB)
Dongarra, J., P. Beckman, P. Aerts, F. Cappello, T. Lippert, S. Matsuoka, P. Messina, T. Moore, R. Stevens, A. Trefethen, et al., The International Exascale Software Project: A Call to Cooperative Action by the Global High Performance Community,” International Journal of High Performance Computing Applications (to appear), July 2009.  (203.04 KB)
Dongarra, J., P. Beckman, T. Moore, P. Aerts, G. Aloisio, J-C. Andre, D. Barkai, J-Y. Berthou, T. Boku, B. Braunschweig, et al., The International Exascale Software Project Roadmap,” International Journal of High Performance Computing, vol. 25, no. 1, pp. 3-60, January 2011.  (719.74 KB)
Agullo, E., C. Coti, T. Herault, J. Langou, S. Peyronnet, A.. Rezmerita, F. Cappello, and J. Dongarra, QCG-OMPI: MPI Applications on Grids,” Future Generation Computer Systems, vol. 27, no. 4, pp. 357-369, March 2010.  (1.48 MB)
Agullo, E., C. Coti, T. Herault, J. Langou, S. Peyronnet, A.. Rezmerita, F. Cappello, and J. Dongarra, QCG-OMPI: MPI Applications on Grids.,” Future Generation Computer Systems, vol. 27, no. 4, pp. 435-369, January 2011.  (1.48 MB)
Bosilca, G., A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” Concurrency and Computation: Practice and Experience, November 2013.  (894.61 KB)
Tech Report
Badia, R. M., M. Beck, F. Bodin, T. Boku, F. Cappello, A. Choudhary, C. Costa, E. Deelman, N. Ferrier, K. Fujisawa, et al., A Collection of Presentations from the BDEC2 Workshop in Kobe, Japan,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-09: University of Tennessee, Knoxville, February 2019.  (58.85 MB)
Altintas, I., K. Marcus, V. Vural, S. Purawat, D. Crawl, G. Antoniu, A. Costan, O. Marcu, P. Balaprakash, R. Cao, et al., A Collection of White Papers from the BDEC2 Workshop in San Diego, CA,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-13: University of Tennessee, October 2019.  (8.25 MB)
Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, Multi-criteria checkpointing strategies: optimizing response-time versus resource utilization,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-01, February 2013.  (497.64 KB)
Bosilca, G., A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” University of Tennessee Computer Science Technical Report (also LAWN 269), no. UT-CS-12-697, June 2012.  (2.76 MB)