%0 Conference Paper %B 17th IEEE High Performance Extreme Computing Conference (HPEC '13) %D 2013 %T Standards for Graph Algorithm Primitives %A Tim Mattson %A David Bader %A Jon Berry %A Aydin Buluc %A Jack Dongarra %A Christos Faloutsos %A John Feo %A John Gilbert %A Joseph Gonzalez %A Bruce Hendrickson %A Jeremy Kepner %A Charles Lieserson %A Andrew Lumsdaine %A David Padua %A Steve W. Poole %A Steve Reinhardt %A Mike Stonebraker %A Steve Wallach %A Andrew Yoo %K algorithms %K graphs %K linear algebra %K software standards %X It is our view that the state of the art in constructing a large collection of graph algorithms in terms of linear algebraic operations is mature enough to support the emergence of a standard set of primitive building blocks. This paper is a position paper defining the problem and announcing our intention to launch an open effort to define this standard. %B 17th IEEE High Performance Extreme Computing Conference (HPEC '13) %I IEEE %C Waltham, MA %8 2013-09 %G eng %R 10.1109/HPEC.2013.6670338 %0 Book Section %B Distributed and Parallel Systems %D 2007 %T A New Approach to MPI Collective Communication Implementations %A Torsten Hoefler %A Jeffrey M. Squyres %A Graham Fagg %A George Bosilca %A Wolfgang Rehm %A Andrew Lumsdaine %K Automatic Selection %K Collective Operation %K Framework %K Message Passing (MPI) %K Open MPI %X Recent research into the optimization of collective MPI operations has resulted in a wide variety of algorithms and corresponding implementations, each typically only applicable in a relatively narrow scope: on a specific architecture, on a specific network, with a specific number of processes, with a specific data size and/or data-type – or any combination of these (or other) factors. This situation presents an enormous challenge to portable MPI implementations which are expected to provide optimized collective operation performance on all platforms. Many portable implementations have attempted to provide a token number of algorithms that are intended to realize good performance on most systems. However, many platform configurations are still left without well-tuned collective operations. This paper presents a proposal for a framework that will allow a wide variety of collective algorithm implementations and a flexible, multi-tiered selection process for choosing which implementation to use when an application invokes an MPI collective function. %B Distributed and Parallel Systems %I Springer US %P 45-54 %@ 978-0-387-69857-1 %G eng %R 10.1007/978-0-387-69858-8_5 %0 Journal Article %J HeteroPar 2006 %D 2006 %T A High-Performance, Heterogeneous MPI %A Richard L. Graham %A Galen M. Shipman %A Brian Barrett %A Ralph Castain %A George Bosilca %A Andrew Lumsdaine %B HeteroPar 2006 %C Barcelona, Spain %8 2006-09 %G eng %0 Journal Article %J ACM Transactions on Mathematical Software %D 2002 %T An Updated Set of Basic Linear Algebra Subprograms (BLAS) %A Susan Blackford %A James Demmel %A Jack Dongarra %A Iain Duff %A Sven Hammarling %A Greg Henry %A Michael Heroux %A Linda Kaufman %A Andrew Lumsdaine %A Antoine Petitet %A Roldan Pozo %A Karin Remington %A Clint Whaley %B ACM Transactions on Mathematical Software %V 28 %P 135-151 %8 2002-12 %G eng %R 10.1145/567806.567807 %0 Journal Article %J (an update), submitted to ACM TOMS %D 2001 %T Basic Linear Algebra Subprograms (BLAS) %A Susan Blackford %A James Demmel %A Jack Dongarra %A Iain Duff %A Sven Hammarling %A Greg Henry %A Michael Heroux %A Linda Kaufman %A Andrew Lumsdaine %A Antoine Petitet %A Roldan Pozo %A Karin Remington %A Clint Whaley %B (an update), submitted to ACM TOMS %8 2001-02 %G eng