Xu Liu, A new approach for performance analysis of OpenMP programs

Slides

The number of hardware threads is growing with each new generation of multicore chips. Programmers must make effective use of these threads to fully exploit emerging parallel platforms. OpenMP is a popular directive-based programming model that helps programmers exploit thread-level parallelism. In this paper, we describe the design and implementation of a novel performance tool for OpenMP. Our tool distinguishes itself from existing OpenMP performance tools in two principal ways. First, we develop a measurement methodology that attributes blame for work and inefficiency back to program contexts. We show how to integrate prior work on measurement methodologies that employ directed and undirected blame shifting and extend the approach to support dynamic thread-level parallelism in both time shared and dedicated environments. Second, we develop a novel deferred context resolution method that supports online attribution of performance metrics to full calling contexts within an OpenMP program execution. This approach enables us to collect compact call path profiles for OpenMP program executions without the need for traces. Support for our approach is an integral part of an emerging standard performance tool application programming interface for OpenMP. We demonstrate the effectiveness of our approach by applying our tool to analyze four well-known application benchmarks that cover the spectrum of OpenMP features. In case studies with these benchmarks, we apply insights gleaned from our tool to improve the implementations, leading to performance improvements ranging from 8%–82%.