Karthik Murthy, Maunam, The Beginnings of a Communication-Avoiding Compiler

Slides

Over the next several years, the speed of computation is expected to increase annually by ~60%, whereas increases in bandwidth (~25%) and reductions in latency (~15%) lag. As a result, the cost of data movement will increasingly become a bottleneck for applications. To address this challenge, we need algorithms and software that minimize communication. The Bebop group at Berkeley has been developing a new class of algorithms called the ".5D family of communication avoidance algorithms" for various operations including Matrix Multiplication, Gaussian Elimination, Nbody simulation, and stencil computations, to name just a few. In this presentation, we introduce a communication-avoiding compiler, which understands important patterns of computation (stencils, matrix operations, ...) and generates communication-avoiding code. Apart from "horizontal" communication avoidance, the compiler will generate code to efficiently manage memory hierarchy (reduce communication on processor, i.e. "vertical" communication avoidance) by employing transformations such as temporal skewing and threaded wavefronts. The compiler manipulates sets of data items, processors, iterations, and the bi-directional mappings between such sets as integer tuples specified using Presburger arithmetic. The compiler generates parallel communication-avoiding code by employing sophisticated computation and data partitioning to these tuples.