High-performance Matrix-matrix Multiplications of Very Small Matrices

TitleHigh-performance Matrix-matrix Multiplications of Very Small Matrices
Publication TypeConference Paper
Year of Publication2016
AuthorsMasliah, I., A. Abdelfattah, A. Haidar, S. Tomov, J. Falcou, and J. Dongarra
Conference Name22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16)
Date Published08-2016
PublisherSpringer International Publishing
Conference LocationGrenoble, France
AbstractThe use of the general dense matrix-matrix multiplication (GEMM) is fundamental for obtaining high performance in many scientific computing applications. GEMMs for small matrices (of sizes less than 32) however, are not sufficiently optimized in existing libraries. In this paper we consider the case of many small GEMMs on either CPU or GPU architectures. This is a case that often occurs in applications like big data analytics, machine learning, high-order FEM, and others. The GEMMs are grouped together in a single batched routine. We present specialized for these cases algorithms and optimization techniques to obtain performance that is within 90% of the optimal. We show that these results outperform currently available state-of-the-art implementations and vendor-tuned math libraries.
Project Tags: