ICL Research Profile

CAARES

Overview

The Cross-layer Application-Aware Resilience at Extreme Scale (CAARES) project, a collaborative effort between ICL, Rutgers University, and Stony Brook, aims to provide a theoretical foundation for multi-level fault management and a clear understanding of existing obstacles that could obstruct generic and efficient approaches for fault management at scale. This effort is vital for large-scale science, because, as extreme-scale computational power enables new and important discoveries across all science domains, the current understanding of fault rates is casting a grim shadow, revealing a future where failures are not exceptions but are the norm.

By studying a combination of fault tolerance techniques not in isolation from each other, CAARES seizes the opportunity to identify moldable techniques at the frontier of known approaches, a composition of methodologies that will inherit their individual benefits but not exhibit their drawbacks, and techniques able to bridge the gap between fault tolerance ergonomics and efficiency.

In Collaboration With

  1. Rutgers University
  2. Stony Brook University

Sponsored by

  1. National Science Foundation