Hierarchical Place Trees: Modeling Memory Hierarchy for Task parallelism, Yonghong Yan

Modern computer systems feature multiple homogeneous or heterogeneous computing units with deep memory hierarchies, and expect a high degree of thread-level parallelism from the software. Exploitation of data locality is critical to achieving scalable parallelism, but adds a significant dimension of complexity to performance optimization of parallel programs. This is especially true for programming models where locality is implicit and opaque to programmers. In this talk, we introduce the hierarchical place tree (HPT) model as a portable abstraction of hardware memory system for task parallelism. The HPT model supports co-allocation of data and computation at multiple levels of a memory hierarchy. It can be viewed as a generalization of concepts from the Sequoia and X10 programming models, resulting in capabilities that are not supported by either. Compared to Sequoia, HPT supports three kinds of data movement in a memory hierarchy, implicit access, explicit in-out parameters, and explicit asynchronous transfer, rather than just explicit data transfer between adjacent levels; as well as dynamic task scheduling rather than static task assignment. Compared to X10, HPT provides a hierarchical notion of places for both computation and data mapping. Our prototype implementation of the HPT model uses the Habanero-Java (HJ) compiler and runtime system. Preliminary results on general-purpose multicore processors and GPU accelerators indicate that the HPT model can be a promising portable abstraction for future multicore processors. We expect better performance improvement in the current work-in-progress implementation using Habanero-C compiler and runtimes.