There is a long history of introducing specialized loop constructs to simplify parallel programming, but the exploration of such constructs for heterogeneous processors like GPUs is still at an early stage. Many applications spend majority of their execution time in loops that are data-parallel in nature. We start with a survey of past and current approaches to expressing data parallel loops. Then, we describe our work-in-progress to extend Habanero-C with extensions at the language, compiler, and runtime levels to support hybrid execution of portable data-parallel loop constructs on modern heterogeneous processors.