%0 Conference Paper
%B 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)
%D 2021
%T Distributed-Memory Multi-GPU Block-Sparse Tensor Contraction for Electronic Structure
%A Thomas Herault
%A Yves Robert
%A George Bosilca
%A Robert Harrison
%A Cannada Lewis
%A Edward Valeev
%A Jack Dongarra
%K block-sparse matrix multiplication
%K distributed-memory
%K Electronic structure
%K multi-GPU node
%K parsec
%K tensor contraction
%X Many domains of scientific simulation (chemistry, condensed matter physics, data science) increasingly eschew dense tensors for block-sparse tensors, sometimes with additional structure (recursive hierarchy, rank sparsity, etc.). Distributed-memory parallel computation with block-sparse tensorial data is paramount to minimize the time-tosolution (e.g., to study dynamical problems or for real-time analysis) and to accommodate problems of realistic size that are too large to fit into the host/device memory of a single node equipped with accelerators. Unfortunately, computation with such irregular data structures is a poor match to the dominant imperative, bulk-synchronous parallel programming model. In this paper, we focus on the critical element of block-sparse tensor algebra, namely binary tensor contraction, and report on an efficient and scalable implementation using the task-focused PaRSEC runtime. High performance of the block-sparse tensor contraction on the Summit supercomputer is demonstrated for synthetic data as well as for real data involved in electronic structure simulations of unprecedented size.
%B 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)
%I IEEE
%C Portland, OR
%8 2021-05
%G eng
%U https://hal.inria.fr/hal-02970659/document
%0 Book Section
%B Rare Earth Elements and Actinides: Progress in Computational Science Applications
%D 2021
%T An Introduction to High Performance Computing and Its Intersection with Advances in Modeling Rare Earth Elements and Actinides
%A Deborah A. Penchoff
%A Edward Valeev
%A Heike Jagode
%A Piotr Luszczek
%A Anthony Danalis
%A George Bosilca
%A Robert J. Harrison
%A Jack Dongarra
%A Theresa L. Windus
%K actinide
%K Computational modeling
%K HPC
%K REE
%X Computationally driven solutions in nuclear and radiochemistry heavily depend on efficient modeling of Rare Earth Elements (REEs) and actinides. Accurate modeling of REEs and actinides faces challenges stemming from limitations from an imbalanced hardware-software ecosystem and its implications on inefficient use of High Performance Computing (HPC). This chapter provides a historical perspective on the evolution of HPC hardware, its intersectionality with domain sciences, the importance of benchmarks for performance, and an overview of challenges and advances in modeling REEs and actinides. This chapter intends to provide an introduction for researchers at the intersection of scientific computing, software development for HPC, and applied computational modeling of REEs and actinides. The chapter is structured in five sections. First, the Introduction includes subsections focusing on the Importance of REEs and Actinides (1.1), Hardware, Software, and the HPC Ecosystem (1.2), and Electronic Structure Modeling of REEs and Actinides (1.3). Second, a section in High Performance Computing focuses on the TOP500 (2.1), HPC Performance (2.2), HPC Benchmarks: Processing, Bandwidth, and Latency (2.3), and HPC Benchmarks and their Relationship to Chemical Modeling (2.4). Third, the Software Challenges and Advances focus on NWChem/NWChemEx (3.1), MADNESS (3.2), and MPQC (3.3). The fourth section provides a short overview of Artificial Intelligence in HPC applications relevant to nuclear and radiochemistry. The fifth section illustrates A Protocol to Evaluate Complexation Preferences in Separations of REEs and Actinides through Computational Modeling.
%B Rare Earth Elements and Actinides: Progress in Computational Science Applications
%I American Chemical Society
%C Washington, DC
%V 1388
%P 3-53
%8 2021-10
%@ ISBN13: 9780841298255 eISBN: 9780841298248
%G eng
%U https://pubs.acs.org/doi/10.1021/bk-2021-1388.ch001
%& 1
%R 10.1021/bk-2021-1388.ch001
%0 Conference Paper
%B 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)
%D 2020
%T The Template Task Graph (TTG) - An Emerging Practical Dataflow Programming Paradigm for Scientific Simulation at Extreme Scale
%A George Bosilca
%A Robert Harrison
%A Thomas Herault
%A Mohammad Mahdi Javanmard
%A Poornima Nookala
%A Edward Valeev
%K dag
%K dataflow
%K exascale
%K graph
%K High-performance computing
%K workflow
%X We describe TESSE, an emerging general-purpose, open-source software ecosystem that attacks the twin challenges of programmer productivity and portable performance for advanced scientific applications on modern high-performance computers. TESSE builds upon and extends the ParsecDAG/-dataflow runtime with a new Domain Specific Languages (DSL) and new integration capabilities. Motivating this work is our belief that such a dataflow model, perhaps with applications composed in domain specific languages, can overcome many of the challenges faced by a wide variety of irregular applications that are poorly served by current programming and execution models. Two such applications from many-body physics and applied mathematics are briefly explored. This paper focuses upon the Template Task Graph (TTG), which is TESSE's main C++ Api that provides a powerful work/data-flow programming model. Algorithms on spatial trees, block-sparse tensors, and wave fronts are used to illustrate the API and associated concepts, as well as to compare with related approaches.
%B 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)
%I IEEE
%8 2020-11
%G eng
%R https://doi.org/10.1109/ESPM251964.2020.00011
%0 Generic
%D 2018
%T Tensor Contraction on Distributed Hybrid Architectures using a Task-Based Runtime System
%A George Bosilca
%A Damien Genet
%A Robert Harrison
%A Thomas Herault
%A Mohammad Mahdi Javanmard
%A Chong Peng
%A Edward Valeev
%X The needs for predictive simulation of electronic structure in chemistry and materials science calls for fast/reduced-scaling formulations of quantum n-body methods that replace the traditional dense tensors with element-, block-, rank-, and block-rank-sparse (data-sparse) tensors. The resulting, highly irregular data structures are a poor match to imperative, bulk-synchronous parallel programming style due to the dynamic nature of the problem and to the lack of clear domain decomposition to guarantee a fair load-balance. TESSE runtime and the associated programming model aim to support performance-portable composition of applications involving irregular and dynamically changing data. In this paper we report an implementation of irregular dense tensor contraction in a paradigmatic electronic structure application based on the TESSE extension of PaRSEC, a distributed hybrid task runtime system, and analyze the resulting performance on a distributed memory cluster of multi-GPU nodes. Unprecedented strong scaling and promising efficiency indicate a viable future for task-based programming of complete production-quality reduced scaling models of electronic structure.
%B Innovative Computing Laboratory Technical Report
%I University of Tennessee
%8 2018-12
%G eng