Publications
Export 135 results:
Filters: Author is George Bosilca [Clear All Filters]
Process Fault-Tolerance: Semantics, Design and Applications for High Performance Computing,”
International Journal for High Performance Applications and Supercomputing (to appear), April 2004.
(186.9 KB)
“
A Proposal for User-Level Failure Mitigation in the MPI-3 Standard,”
University of Tennessee Electrical Engineering and Computer Science Technical Report, no. ut-cs-12-693: University of Tennessee, February 2012.
(159.46 KB)
“
PTG: An Abstraction for Unhindered Parallelism,”
International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), New Orleans, LA, IEEE Press, November 2014.
(480.05 KB)
“
Reasons for a Pessimistic or Optimistic Message Logging Protocol in MPI Uncoordinated Failure Recovery,”
CLUSTER '09, New Orleans, IEEE, August 2009.
DOI: 10.1109/CLUSTR.2009.5289157
(191.36 KB)
“
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,”
SIAM SISC (to appear), May 2007.
(241.36 KB)
“
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,”
University of Tennessee Computer Science Department Technical Report, UT-CS-04-538, 00 2005.
(241.36 KB)
“
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,”
ICL Technical Report, no. ICL-UT-04-04, January 2004.
(241.36 KB)
“
Redesigning the Message Logging Model for High Performance,”
International Supercomputer Conference (ISC 2008), Dresden, Germany, January 2008.
(622.1 KB)
“
Redesigning the Message Logging Model for High Performance,”
Concurrency and Computation: Practice and Experience (online version), June 2010.
(438.42 KB)
“
Reliability Analysis of Self-Healing Network using Discrete-Event Simulation,”
Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07): IEEE Computer Society, pp. 437-444, May 2007.
“Retrospect: Deterministic Relay of MPI Applications for Interactive Distributed Debugging,”
Accepted for Euro PVM/MPI 2007: Springer, September 2007.
“Runtime Level Failure Detection and Propagation in HPC Systems,”
European MPI Users' Group Meeting (EuroMPI '19), Zürich, Switzerland, ACM, September 2019.
DOI: 10.1145/3343211.3343225
(1.11 MB)
“
On Scalability for MPI Runtime Systems,”
University of Tennessee Computer Science Technical Report, no. ICL-UT-11-05, Knoxville, TN, May 2011.
(898.76 KB)
“
On Scalability for MPI Runtime Systems,”
International Conference on Cluster Computing (CLUSTER), Austin, TX, USA, IEEEE, pp. 187-195, September 2011.
(898.76 KB)
“
Scalable Dense Linear Algebra on Heterogeneous Hardware,”
HPC: Transition Towards Exascale Processing, in the series Advances in Parallel Computing, 2013.
(760.32 KB)
“
Scalable Fault Tolerant MPI: Extending the Recovery Algorithm,”
Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, vol. 3666, Sorrento (Naples) , Italy, Springer-Verlag Berlin, pp. 67, September 2005.
(144.86 KB)
“
Scalable Fault Tolerant Protocol for Parallel Runtime Environments,”
2006 Euro PVM/MPI, no. ICL-UT-06-12, Bonn, Germany, 00 2006.
(149.07 KB)
“
Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure,”
Proceedings of Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, vol. 6960, Santorini, Greece, Springer, pp. 342-344, September 2011.
(115.75 KB)
“
SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3,”
University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-595, 00 2007.
(1.74 MB)
“
Self Adapting Numerical Software SANS Effort,”
IBM Journal of Research and Development, vol. 50, no. 2/3, pp. 223-238, January 2006.
(357.53 KB)
“
Self-Healing in Binomial Graph Networks,”
2nd International Workshop On Reliability in Decentralized Distributed Systems (RDDS 2007), Vilamoura, Algarve, Portugal, November 2007.
(322.39 KB)
“
Self-Healing Network for Scalable Fault Tolerant Runtime Environments,”
DAPSYS 2006, 6th Austrian-Hungarian Workshop on Distributed and Parallel Systems, Innsbruck, Austria, January 2006.
(162.83 KB)
“
Self-Healing Network for Scalable Fault-Tolerant Runtime Environments,”
Future Generation Computer Systems, vol. 26, no. 3, pp. 479-485, March 2010.
(1.54 MB)
“
A Survey of MPI Usage in the US Exascale Computing Project,”
Concurrency Computation: Practice and Experience, September 2018.
DOI: 10.1002/cpe.4851
(359.54 KB)
“
Surviving Errors with OpenSHMEM,”
OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, Baltimore, MD, USA, Springer International Publishing, pp. 66–81, 2016.
“System Software for Many-Core and Multi-core Architecture,”
Advanced Software Technologies for Post-Peta Scale Computing: The Japanese Post-Peta CREST Research Project, Singapore, Springer Singapore, pp. 59–75, 2019.
DOI: 10.1007/978-981-13-1924-2_4
“Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes,”
23rd International Heterogeneity in Computing Workshop, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(807.33 KB)
“
Task-Based Programming for Seismic Imaging: Preliminary Results,”
2014 IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.
(625.86 KB)
“
Tensor Contraction on Distributed Hybrid Architectures using a Task-Based Runtime System,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-13: University of Tennessee, December 2018.
(326.11 KB)
“
UCX: An Open Source Framework for HPC Network APIs and Beyond,”
2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, Santa Clara, CA, USA, IEEE, pp. 40-43, 2015.
DOI: 10.1109/HOTI.2015.13
“A Unified HPC Environment for Hybrid Manycore/GPU Distributed Systems,”
IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.
“Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,”
Concurrency and Computation: Practice and Experience, November 2013.
DOI: 10.1002/cpe.3173
(894.61 KB)
“
Unified Model for Assessing Checkpointing Protocols at Extreme-Scale,”
University of Tennessee Computer Science Technical Report (also LAWN 269), no. UT-CS-12-697, June 2012.
(2.76 MB)
“
Using Software-Based Performance Counters to Expose Low-Level Open MPI Performance Information,”
EuroMPI, Chicago, IL, ACM, September 2017.
DOI: 10.1145/3127024.3127039
(745.58 KB)
“
Utilizing Dataflow-based Execution for Coupled Cluster Methods,”
2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-02, Madrid, Spain, IEEE, September 2014.
(260.23 KB)
“
Pages
- « first
- ‹ previous
- 1
- 2
- 3