Publications

Export 66 results:
Filters: Author is Thomas Herault  [Clear All Filters]
2018
Bouteiller, A., G. Bosilca, T. Herault, and J. Dongarra, Data Movement Interfaces to Support Dataflow Runtimes,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-03: University of Tennessee, May 2018.  (210.94 KB)
Le Fèvre, V., G. Bosilca, A. Bouteiller, T. Herault, A. Hori, Y. Robert, and J. Dongarra, Do moldable applications perform better on failure-prone HPC platforms?,” Resilience: 11th Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids, jointly published with Euro-Par 2018: Springer Verlag, 2018.  (360.72 KB)
Herault, T., Y. Robert, A. Bouteiller, D. Arnold, K. Ferreira, G. Bosilca, and J. Dongarra, Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms,” APDCM workshop (co-located with IPDPS), Vancouver, BC, Canada, IEEE, May 2018.  (899.3 KB)
2017
Seo, S., A. Amer, P. Balaji, C. Bordage, G. Bosilca, A. Brooks, P. Carns, A. Castello, D. Genet, T. Herault, et al., Argobots: A Lightweight Low-Level Threading and Tasking Framework,” IEEE Transactions on Parallel and Distributed Systems, October 2017.
Hoque, R., T. Herault, G. Bosilca, and J. Dongarra, Dynamic Task Discovery in PaRSEC- A data-flow task-based Runtime,” ScalA17, Denver, ACM, September 2017.  (1.15 MB)
Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, A Failure Detector for HPC Platforms,” The International Journal of High Performance Computing Applications, July 2017.  (1.04 MB)
2016
Herrmann, J., G. Bosilca, T. Herault, L. Marchal, Y. Robert, and J. Dongarra, Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results,” Parallel Computing, vol. 52, pp. 22-41, February 2016.  (2.06 MB)
Bosilca, G., T. Herault, and J. Dongarra, Context Identifier Allocation in Open MPI,” University of Tennessee Computer Science Technical Report, no. ICL-UT-16-01: Innovative Computing Laboratory, University of Tennessee, January 2016.  (490.89 KB)
Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, Failure Detection and Propagation in HPC Systems,” Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.
2015
Bouteiller, A., T. Herault, G. Bosilca, P. Du, and J. Dongarra, Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy,” ACM Transactions on Parallel Computing, vol. 1, issue 2, no. 10, pp. 10:1-10:28, January 2015.  (1.14 MB)
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing,” International Journal of Networking and Computing, vol. 5, no. 1, pp. 2-15, January 2015.  (755.54 KB)
Cao, C., G. Bosilca, T. Herault, and J. Dongarra, Design for a Soft Error Resilient Dynamic Task-based Runtime,” 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, IEEE, May 2015.  (2.31 MB)
Dongarra, J., T. Herault, and Y. Robert, Fault Tolerance Techniques for High-performance Computing,” University of Tennessee Computer Science Technical Report (also LAWN 289), no. UT-EECS-15-734: University of Tennessee, May 2015.
Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof,” Innovative Computing Laboratory Technical Report, no. ICL-UT-15-01, April 2015.  (570.97 KB)
Herault, T., A. Bouteiller, G. Bosilca, M. Gamell, K. Teranishi, M. Parashar, and J. Dongarra, Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.  (550.96 KB)
2014
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the Impact of ABFT and Checkpoint Composite Strategies,” 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.  (1.02 MB)
Cao, C., T. Herault, G. Bosilca, and J. Dongarra, Design for a Soft Error Resilient Dynamic Task-based Runtime,” ICL Technical Report, no. ICL-UT-14-04: University of Tennessee, November 2014.  (2.61 MB)
Bouteiller, A., T. Herault, and G. Bosilca, A Multithreaded Communication Substrate for OpenSHMEM,” 8th International Conference on Partitioned Global Address Space Programming Models (PGAS), Eugene, OR, October 2014.  (261.66 KB)
Dongarra, J., T. Herault, and Y. Robert, Performance and Reliability Trade-offs for the Double Checkpointing Algorithm,” International Journal of Networking and Computing, vol. 4, no. 1, pp. 32-41, 2014.  (859.04 KB)
Danalis, A., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, PTG: An Abstraction for Unhindered Parallelism,” International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.  (480.05 KB)
2013
Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, Assessing the impact of ABFT and Checkpoint composite strategies,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.  (968.47 KB)
Aupy, G., A. Benoit, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, On the Combination of Silent Error Detection and Checkpointing,” UT-CS-13-710: University of Tennessee Computer Science Technical Report, June 2013.  (1.29 MB)
Bouteiller, A., T. Herault, G. Bosilca, and J. Dongarra, Correlated Set Coordination in Fault Tolerant Message Logging Protocols,” Concurrency and Computation: Practice and Experience, vol. 25, issue 4, pp. 572-585, March 2013.  (636.68 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Luszczek, and J. Dongarra, Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach,” Scalable Computing and Communications: Theory and Practice: John Wiley & Sons, pp. 699-735, March 2013.  (1.01 MB)
Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, An evaluation of User-Level Failure Mitigation support in MPI,” Computing, vol. 95, issue 12, pp. 1171-1184, December 2013.  (311.23 KB)
Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI,” Concurrency and Computation: Practice and Experience, July 2013.  (3.89 MB)
Dongarra, J., M. Faverge, T. Herault, M. Jacquelin, J. Langou, and Y. Robert, Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems,” Parallel Computing, vol. 39, issue 4-5, pp. 212-232, May 2013.  (1.43 MB)
Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, Multi-criteria checkpointing strategies: optimizing response-time versus resource utilization,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-01, February 2013.  (497.64 KB)
Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization,” Euro-Par 2013, Aachen, Germany, Springer, August 2013.  (431.84 KB)
Aupy, G., A. Benoit, T. Herault, Y. Robert, and J. Dongarra, Optimal Checkpointing Period: Time vs. Energy,” University of Tennessee Computer Science Technical Report (also LAWN 281), no. ut-eecs-13-718: University of Tennessee, October 2013.  (440.13 KB)
Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, T. Herault, and J. Dongarra, PaRSEC: Exploiting Heterogeneity to Enhance Scalability,” IEEE Computing in Science and Engineering, vol. 15, issue 6, pp. 36-45, November 2013.  (2.16 MB)
Bland, W., A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, Post-failure recovery of MPI communication capability: Design and rationale,” International Journal of High Performance Computing Applications, vol. 27, issue 3, pp. 244 - 254, January 2013.  (285.77 KB)
Dongarra, J., T. Herault, and Y. Robert, Revisiting the Double Checkpointing Algorithm,” 15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium, Boston, MA, May 2013.  (591.1 KB)
Dongarra, J., T. Herault, and Y. Robert, Revisiting the Double Checkpointing Algorithm,” University of Tennessee Computer Science Technical Report (LAWN 274), no. ut-cs-13-705, January 2013.  (682.22 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, Scalable Dense Linear Algebra on Heterogeneous Hardware,” HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, 2013.  (760.32 KB)
Bosilca, G., A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” Concurrency and Computation: Practice and Experience, November 2013.  (894.61 KB)
2012
Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, Algorithm-Based Fault Tolerance for Dense Matrix Factorization,” Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, ACM, pp. 225-234, February 2012.  (865.79 KB)
Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI,” 18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award), Rhodes, Greece, Springer-Verlag, August 2012.  (289.32 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, DAGuE: A generic distributed DAG Engine for High Performance Computing.,” Parallel Computing, vol. 38, no. 1-2: Elsevier, pp. 27-51, 00-2012.  (830.85 KB)
Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, An Evaluation of User-Level Failure Mitigation Support in MPI,” Proceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, Springer, September 2012.
Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, Extending the Scope of the Checkpoint-on-Failure Protocol for Forward Recovery in Standard MPI,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-702, 00-2012.  (422.76 KB)
Danalis, A., A. Bouteiller, G. Bosilca, J. Dongarra, and T. Herault, From Serial Loops to Parallel Execution on Distributed Systems,” PPoPP 2012 (submitted), New Orleans, LA, February 2012.  (319.5 KB)
Dongarra, J., M. Faverge, T. Herault, J. Langou, and Y. Robert, Hierarchical QR factorization algorithms for multi-core cluster systems,” IPDPS 2012, the 26th IEEE International Parallel and Distributed Processing Symposium, Shanghai, China, IEEE Computer Society Press, May 2012.  (405.71 KB)
Bland, W., G. Bosilca, A. Bouteiller, T. Herault, and J. Dongarra, A Proposal for User-Level Failure Mitigation in the MPI-3 Standard,” University of Tennessee Electrical Engineering and Computer Science Technical Report, no. ut-cs-12-693: University of Tennessee, February 2012.  (159.46 KB)
Bosilca, G., A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Herault, Y. Robert, F. Vivien, and D. Zaidouni, Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,” University of Tennessee Computer Science Technical Report (also LAWN 269), no. UT-CS-12-697, June 2012.  (2.76 MB)
2011
Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, Algorithm-based Fault Tolerance for Dense Matrix Factorizations,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.  (865.79 KB)
Bouteiller, A., T. Herault, G. Bosilca, and J. Dongarra, Correlated Set Coordination in Fault Tolerant Message Logging Protocols,” Proceedings of 17th International Conference, Euro-Par 2011, Part II, vol. 6853, Bordeaux, France, Springer, pp. 51-64, August 2011.  (486.68 KB)
Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, P. Lemariner, and J. Dongarra, DAGuE: A Generic Distributed DAG Engine for High Performance Computing,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1151-1158, 00-2011.  (830.85 KB)
Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1432-1441, May 2011.  (1.26 MB)
Dongarra, J., M. Faverge, T. Herault, J. Langou, and Y. Robert, Hierarchical QR factorization algorithms for multi-core cluster systems,” University of Tennessee Computer Science Technical Report (also Lawn 257), no. UT-CS-11-684, October 2011.  (405.71 KB)

Pages