|
Compilation for Explicitly Managed Memory Hierarchies
Stanford University
Timothy J. Knight
Ji Young Park
Manman Ren
Mike Houston
Mattan Erez
Kayvon Fatahalian
Alex Aiken
William J. Dally
Pat Hanrahan
This paper presents the design and implementation of an optimizing compiler for architectures with software-managed memory hierarchies. We model programs as hierarchies of bulk operations with explicit parallelism. Bulk operations are either bulk data transfers or kernels (bulk computations) on bulk data. Our focus here is on the backend compiler phases; we begin with a relatively lowlevel intermediate representation close to the target class of machines. In this representation all parallelism is explicit, every bulk operation is assigned to a speci c level of the memory hierarchy, and all sizes of data sets are known.We then illustrate the important optimizations that should be applied to such programs to produce ef cient code for hierarchical memory machines. In this paper we do not consider the problem of mapping a source-level language to the intermediate representation (but see [16] for an example). Any performance-oriented compiler must address the issues we consider regardless of programming model; a compiler for Pro Agramming model further removed from the target class of machines simply faces a superset of the issues we address.
We make the following contributions:
- We present a general model for machines with explicitly managed memory hierarchies (Section 2). This model illustrates the level of abstraction at which our compiler works and provides the conceptual framework in which we develop our optimizations.
- We present an intermediate representation (IR) expressing programs as hierarchies of bulk operations. An IR that represents operations at multiple scales enables optimizations to be applied uniformly across different levels of the machine: the same optimizations apply to coarse-grained computation near the root and ner-grained computation at the leaves.
- We also give an operational semantics for IR programs executing on an abstract machine, which both makes clear our notions of hierarchical programs and bulk operations and provides the basis for understanding the correctness of our optimizations (Section 3).
- We describe a number of optimizations that are key to acheiving high performance on our target class of machines (Section 4). We discuss some important details of compiling for the Cell architecture (Section 5) and give experimental results with a Cellbased compiler showing the effectiveness of our optimizations for utilizing execution and bandwidth resources (Section 6).
http://theory.stanford.edu/~aiken/publications/new/hierarchy.pdf |
|