The objective of the Data-driven Autotuning for Runtime Execution (DARE) project is to provide application-level performance tuning capabilities to the end user. DARE’s development motivation stems from the never-ending hurdles of performance tuning of the PLASMA and MAGMA linear algebra libraries. These hurdles motivated the development of a software architecture that combines three components: hardware analysis, kernel modeling, and workload simulation.
With DARE, the hardware analysis block builds a detailed model of the hardware, its computational resources (CPU cores, GPU accelerators, Xeon Phi coprocessors), and its memory system (host memories, device memories, multiple levels of cache). The kernel modeling block builds accurate performance models for the computational kernels involved in the workload, depending on granularity, place of execution, induced memory traffic, etc.; and the workload simulation block rapidly simulates a large number of runs in order to find the best execution conditions, while relying on the information provided by the other two blocks. The ultimate objective of DARE is to arrange the blocks in a continuous refinement loop that can serve as a framework for optimizing applications beyond the field of dense linear algebra.