Manycore architectures -- hundreds to thousands of cores per processor -- are seen by many as a natural evolution of multicore processors. To take advantage of this massive parallelism in practice requires a productive parallel programming model, and an efficient runtime for thread execution and coordination. Dynamic task parallelism, championed recently by several parallel programming languages, promises to be an effective approach to parallel programming, since unlike data-parallel and SPMD programming models, concurrent tasks can be dynamically created and joined at any time along the execution. A critical prerequisite for an efficient task-parallel runtime is a scalable synchronization mechanism to support task coordination at different levels of granularity.
This talk presents a study of synchronization issues in a task parallel runtime on top of manycore architectures, and provides alternative approaches to low-level hardware synchronization primitives. We have implemented a high-level synchronization construct called phasers on a Cyclops64 manycore processor. Phasers support both localized and group synchronization of tasks by allowing threads to register and deregister from groups of synchronizing tasks. It provides a general unification of point-to-point and collective synchronizations with easy-to-use interfaces. We have experimented with several approaches to phaser implementation using software, hardware or a combination of both to explore their portability, usability and performance.