It is now widely recognized that increased levels of parallelism is a necessary condition for improved application performance on multicore computers. Unfortunately, as the number of cores increases, the memory-per-core ratio is expected to further decrease, making per-core memory efficiency of parallel programs an even more important concern in future systems. For many parallel applications, the memory requirements can be significantly larger than for their sequential counterparts and, more importantly, their memory utilization depends critically on the schedule used when running them.
To address the problem of finding memory-efficient parallel schedules, we propose bounded memory scheduling (BMS). BMS is a scheduling approach for parallel programs expressed as dynamic task graphs, in which a user-specified upper bound is imposed on the program's peak memory. It tailors the set of allowable schedules to ensure that they can be executed within a given peak memory bound. The BMS algorithm is based on the inspector-executor model and guarantees that the memory bound is respected, or throws an error during the inspector phase without running the computation if no feasible schedule can be found. Through evaluation on seven benchmarks, we show that BMS enables a wide range of trade-offs between parallelism and memory consumption. In most cases, it is able to take advantage of all the available parallelism in hardware with only a small increase in the memory consumption relative to a sequential schedule.