ICL Research Profile

SMURFS

Overview

The Simulation and Modeling for Understanding Resilience and Faults at Scale (SMURFS) project seeks to acquire the predictive understanding of the complex interactions amongst a given application, a given real or hypothetical hardware and software environment, and a given fault-tolerance (FT) strategy at extreme scale. SMURFS is characterized by two facets: medium- and fine-grained predictive capabilities and coarse-grained FT-strategy selection. Accordingly, ICL plans to design, develop, and validate new analytical and system component models that use semi-detailed software and hardware specifications to predict application performance in terms of time-to-solution and energy consumption. Also, based on a comprehensive set of studies using several application benchmarks, proxies and full applications, and several different FT strategies, ICL will gather valuable insights about application behavior at scale.