%0 Journal Article %J Future Generation Computer Systems %D 2010 %T Self-Healing Network for Scalable Fault-Tolerant Runtime Environments %A Thara Angskun %A Graham Fagg %A George Bosilca %A Jelena Pjesivac–Grbovic %A Jack Dongarra %B Future Generation Computer Systems %V 26 %P 479-485 %8 2010-03 %G eng %0 Conference Proceedings %B Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07) %D 2007 %T Binomial Graph: A Scalable and Fault- Tolerant Logical Network Topology %A Thara Angskun %A George Bosilca %A Jack Dongarra %K ftmpi %B Proceedings of The Fifth International Symposium on Parallel and Distributed Processing and Applications (ISPA07) %I Springer %C Niagara Falls, Canada %8 2007-08 %G eng %0 Journal Article %J Euro-Par 2007 %D 2007 %T Decision Trees and MPI Collective Algorithm Selection Problem %A Jelena Pjesivac–Grbovic %A George Bosilca %A Graham Fagg %A Thara Angskun %A Jack Dongarra %K ftmpi %B Euro-Par 2007 %I Springer %C Rennes, France %P 105–115 %8 2007-08 %G eng %0 Journal Article %J Parallel Computing (Special Edition: EuroPVM/MPI 2006) %D 2007 %T MPI Collective Algorithm Selection and Quadtree Encoding %A Jelena Pjesivac–Grbovic %A George Bosilca %A Graham Fagg %A Thara Angskun %A Jack Dongarra %K ftmpi %B Parallel Computing (Special Edition: EuroPVM/MPI 2006) %I Elsevier %8 2007-00 %G eng %0 Conference Proceedings %B The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT) %D 2007 %T Optimal Routing in Binomial Graph Networks %A Thara Angskun %A George Bosilca %A Brad Vander Zanden %A Jack Dongarra %K ftmpi %B The International Conference on Parallel and Distributed Computing, applications and Technologies (PDCAT) %I IEEE Computer Society %C Adelaide, Australia %8 2007-12 %G eng %0 Journal Article %J Cluster computing %D 2007 %T Performance Analysis of MPI Collective Operations %A Jelena Pjesivac–Grbovic %A Thara Angskun %A George Bosilca %A Graham Fagg %A Edgar Gabriel %A Jack Dongarra %K ftmpi %B Cluster computing %I Springer Netherlands %V 10 %P 127-143 %8 2007-06 %G eng %0 Conference Proceedings %B Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07) %D 2007 %T Reliability Analysis of Self-Healing Network using Discrete-Event Simulation %A Thara Angskun %A George Bosilca %A Graham Fagg %A Jelena Pjesivac–Grbovic %A Jack Dongarra %K ftmpi %B Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07) %I IEEE Computer Society %P 437-444 %8 2007-05 %G eng %0 Conference Proceedings %B 2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007) %D 2007 %T Self-Healing in Binomial Graph Networks %A Thara Angskun %A George Bosilca %A Jack Dongarra %B 2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007) %C Vilamoura, Algarve, Portugal %8 2007-11 %G eng %0 Journal Article %J 2006 Euro PVM/MPI (submitted) %D 2006 %T Flexible collective communication tuning architecture applied to Open MPI %A Graham Fagg %A Jelena Pjesivac–Grbovic %A George Bosilca %A Thara Angskun %A Jack Dongarra %K ftmpi %B 2006 Euro PVM/MPI (submitted) %C Bonn, Germany %8 2006-01 %G eng %0 Generic %D 2006 %T MPI Collective Algorithm Selection and Quadtree Encoding %A Jelena Pjesivac–Grbovic %A Graham Fagg %A Thara Angskun %A George Bosilca %A Jack Dongarra %K ftmpi %B ICL Technical Report %8 2006-00 %G eng %0 Journal Article %J Lecture Notes in Computer Science %D 2006 %T MPI Collective Algorithm Selection and Quadtree Encoding %A Jelena Pjesivac–Grbovic %A Graham Fagg %A Thara Angskun %A George Bosilca %A Jack Dongarra %K ftmpi %B Lecture Notes in Computer Science %I Springer Berlin / Heidelberg %V 4192 %P 40-48 %8 2006-09 %G eng %0 Journal Article %J 2006 Euro PVM/MPI %D 2006 %T Scalable Fault Tolerant Protocol for Parallel Runtime Environments %A Thara Angskun %A Graham Fagg %A George Bosilca %A Jelena Pjesivac–Grbovic %A Jack Dongarra %K ftmpi %B 2006 Euro PVM/MPI %C Bonn, Germany %8 2006-00 %G eng %0 Conference Proceedings %B DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems %D 2006 %T Self-Healing Network for Scalable Fault Tolerant Runtime Environments %A Thara Angskun %A Graham Fagg %A George Bosilca %A Jelena Pjesivac–Grbovic %A Jack Dongarra %B DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems %C Innsbruck, Austria %8 2006-01 %G eng %0 Conference Proceedings %B Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (to appear) %D 2005 %T Fault Tolerant High Performance Computing by a Coding Approach %A Zizhong Chen %A Graham Fagg %A Edgar Gabriel %A Julien Langou %A Thara Angskun %A George Bosilca %A Jack Dongarra %K ftmpi %K grads %K lacsi %K sans %B Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (to appear) %C Chicago, Illinois %8 2005-01 %G eng %0 Conference Proceedings %B 4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05) %D 2005 %T Performance Analysis of MPI Collective Operations %A Jelena Pjesivac–Grbovic %A Thara Angskun %A George Bosilca %A Graham Fagg %A Edgar Gabriel %A Jack Dongarra %K ftmpi %B 4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05) %C Denver, Colorado %8 2005-04 %G eng %0 Journal Article %J Cluster Computing Journal (to appear) %D 2005 %T Performance Analysis of MPI Collective Operations %A Jelena Pjesivac–Grbovic %A Thara Angskun %A George Bosilca %A Graham Fagg %A Edgar Gabriel %A Jack Dongarra %K ftmpi %B Cluster Computing Journal (to appear) %8 2005-01 %G eng %0 Conference Proceedings %B Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI %D 2005 %T Scalable Fault Tolerant MPI: Extending the Recovery Algorithm %A Graham Fagg %A Thara Angskun %A George Bosilca %A Jelena Pjesivac–Grbovic %A Jack Dongarra %E Beniamino Di Martino %K ftmpi %B Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI %I Springer-Verlag Berlin %C Sorrento (Naples) , Italy %V 3666 %P 67 %8 2005-09 %G eng %0 Conference Proceedings %B Proceedings of ISC2004 (to appear) %D 2004 %T Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems %A Graham Fagg %A Edgar Gabriel %A George Bosilca %A Thara Angskun %A Zizhong Chen %A Jelena Pjesivac–Grbovic %A Kevin London %A Jack Dongarra %K ftmpi %K lacsi %B Proceedings of ISC2004 (to appear) %C Heidelberg, Germany %8 2004-06 %G eng %0 Journal Article %J International Journal for High Performance Applications and Supercomputing (to appear) %D 2004 %T Process Fault-Tolerance: Semantics, Design and Applications for High Performance Computing %A Graham Fagg %A Edgar Gabriel %A Zizhong Chen %A Thara Angskun %A George Bosilca %A Jelena Pjesivac–Grbovic %A Jack Dongarra %K ftmpi %K lacsi %B International Journal for High Performance Applications and Supercomputing (to appear) %8 2004-04 %G eng %0 Conference Proceedings %B Los Alamos Computer Science Institute (LACSI) Symposium 2003 (presented) %D 2003 %T Fault Tolerant Communication Library and Applications for High Performance Computing %A Graham Fagg %A Edgar Gabriel %A Zizhong Chen %A Thara Angskun %A George Bosilca %A Antonin Bukovsky %A Jack Dongarra %K ftmpi %K lacsi %B Los Alamos Computer Science Institute (LACSI) Symposium 2003 (presented) %C Santa Fe, NM %8 2003-10 %G eng %0 Conference Proceedings %B 17th Annual ACM International Conference on Supercomputing (ICS'03) International Workshop on Grid Computing and e-Science %D 2003 %T A Fault-Tolerant Communication Library for Grid Environments %A Edgar Gabriel %A Graham Fagg %A Antonin Bukovsky %A Thara Angskun %A Jack Dongarra %K ftmpi %K lacsi %B 17th Annual ACM International Conference on Supercomputing (ICS'03) International Workshop on Grid Computing and e-Science %C San Francisco %8 2003-06 %G eng