Modern computing platforms are becoming increasingly heterogeneous in order to improve energy efficiency. This trend is clearly seen across a diverse spectrum of platforms, from small-scale embedded SOCs to large-scale super-computers. However, programming these heterogeneous platforms poses a serious challenge for application developers, especially for domain experts who do not have experience in writing parallel programs.
We have designed a software flow for converting a high-level graph textual representation (CnC) to parallel C code: the Habanero-C language. Programs written using CnC have a clear separation between the application description, the implementation of each of the application components and the abstraction of hardware platform, making it a very good programming model for domain experts. Our contributions include extending the textual representation to increase programmability through code generation, extending the Habanero-C runtime system to support work-stealing across heterogeneous computing devices and introducing task affinity for these heterogeneous components to allow users to fine tune the runtime scheduling decisions. We demonstrate a working example from the medical domain ran onto a prototype heterogeneous platform that includes CPUs, GPUs and FPGAs. We show that our model offers up to 17.72X speedup and an estimated usage of 0.52X of the power used by CPUs alone, when using accelerators (GPUs and FPGAs) and CPUs.
Our ongoing work involves further extending the CnC model to provide an intermediate graph representation to which both domain experts and compilers can map to. This new model (CDSC-GR), permits better analysis of the dependences between steps and data, leading to opportunities for optimization and tuning, while still targeting heterogeneous platforms through appropriate code generation.