Dynamic lightweight task parallelism has been identified as one of the prerequisites for success in improving productivity and performance on future many-core processors. In dynamic task parallelism, computations are created dynamically and the runtime scheduler is responsible for scheduling the computations across the cores. The sets of task graphs that can be supported by a dynamic scheduler depend on the underlying task primitives in the parallel programming model, with various classes of fork-join structures used most often in practice. However, many researchers have advocated the benefits of more general task graph structures, and have shown that the use of these more general structures can lead to improved performance.
We propose an extension to task parallelism called Data-Driven Futures (DDFs) that can be used to create arbitrary task graph structures. Unlike a normal task that starts execution upon creation, a data-driven task specifies its input constraints in an await clause. A DDF can be viewed as a container with a state that obeys the single-assignment rule. The runtime scheduler will then ensure that a task is only scheduled when all the DDFs in its await clause become available (full). Furthermore, Data-Driven Futures support non-blocking semantics, which prevents resource under-utilization caused by blocking. DDFs also provide race-freedom and determinacy properties which allow users to analyze and debug their programs with precision.
We compare four scheduling algorithms of the Concurrent Collection model to the data-driven scheduling using DDFs, and include performance evaluations of these five algorithms on a variety of benchmark programs and multicore platforms. Our results show that the Data-Driven scheduler is the best approach, both from the viewpoints of memory efficiency and scalable parallelism.