%0 Conference Paper
%B 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)
%D 2021
%T Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure
%A Thomas Herault
%A Yves Robert
%A George Bosilca
%A Robert Harrison
%A Cannada Lewis
%A Edward Valeev
%A Jack Dongarra
%K block-sparse matrix multiplication
%K distributed-memory
%K Electronic structure
%K multi-GPU node
%K parsec
%K tensor contraction
%X Many domains of scientific simulation (chemistry, condensed matter physics, data science) increasingly eschew dense tensors for block-sparse tensors, sometimes with additional structure (recursive hierarchy, rank sparsity, etc.). Distributed-memory parallel computation with block-sparse tensorial data is paramount to minimize the time-tosolution (e.g., to study dynamical problems or for real-time analysis) and to accommodate problems of realistic size that are too large to fit into the host/device memory of a single node equipped with accelerators. Unfortunately, computation with such irregular data structures is a poor match to the dominant imperative, bulk-synchronous parallel programming model. In this paper, we focus on the critical element of block-sparse tensor algebra, namely binary tensor contraction, and report on an efficient and scalable implementation using the task-focused PaRSEC runtime. High performance of the block-sparse tensor contraction on the Summit supercomputer is demonstrated for synthetic data as well as for real data involved in electronic structure simulations of unprecedented size.
%B 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)
%I IEEE
%C Portland, OR
%8 2021-05
%G eng
%U https://hal.inria.fr/hal-02970659/document
%0 Conference Paper
%B 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)
%D 2020
%T The Template Task Graph (TTG) - An Emerging Practical Dataflow Programming Paradigm for Scientific Simulation at Extreme Scale
%A George Bosilca
%A Robert Harrison
%A Thomas Herault
%A Mohammad Mahdi Javanmard
%A Poornima Nookala
%A Edward Valeev
%K dag
%K dataflow
%K exascale
%K graph
%K High-performance computing
%K workflow
%X We describe TESSE, an emerging general-purpose, open-source software ecosystem that attacks the twin challenges of programmer productivity and portable performance for advanced scientific applications on modern high-performance computers. TESSE builds upon and extends the ParsecDAG/-dataflow runtime with a new Domain Specific Languages (DSL) and new integration capabilities. Motivating this work is our belief that such a dataflow model, perhaps with applications composed in domain specific languages, can overcome many of the challenges faced by a wide variety of irregular applications that are poorly served by current programming and execution models. Two such applications from many-body physics and applied mathematics are briefly explored. This paper focuses upon the Template Task Graph (TTG), which is TESSE's main C++ Api that provides a powerful work/data-flow programming model. Algorithms on spatial trees, block-sparse tensors, and wave fronts are used to illustrate the API and associated concepts, as well as to compare with related approaches.
%B 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)
%I IEEE
%8 2020-11
%G eng
%R https://doi.org/10.1109/ESPM251964.2020.00011
%0 Generic
%D 2018
%T Tensor Contraction on Distributed Hybrid Architectures using a Task-Based Runtime System
%A George Bosilca
%A Damien Genet
%A Robert Harrison
%A Thomas Herault
%A Mohammad Mahdi Javanmard
%A Chong Peng
%A Edward Valeev
%X The needs for predictive simulation of electronic structure in chemistry and materials science calls for fast/reduced-scaling formulations of quantum n-body methods that replace the traditional dense tensors with element-, block-, rank-, and block-rank-sparse (data-sparse) tensors. The resulting, highly irregular data structures are a poor match to imperative, bulk-synchronous parallel programming style due to the dynamic nature of the problem and to the lack of clear domain decomposition to guarantee a fair load-balance. TESSE runtime and the associated programming model aim to support performance-portable composition of applications involving irregular and dynamically changing data. In this paper we report an implementation of irregular dense tensor contraction in a paradigmatic electronic structure application based on the TESSE extension of PaRSEC, a distributed hybrid task runtime system, and analyze the resulting performance on a distributed memory cluster of multi-GPU nodes. Unprecedented strong scaling and promising efficiency indicate a viable future for task-based programming of complete production-quality reduced scaling models of electronic structure.
%B Innovative Computing Laboratory Technical Report
%I University of Tennessee
%8 2018-12
%G eng