Tiling is a critical data locality transformation for performance optimization. A number of recent efforts have focused on advanced techniques for generating parametrically tiled code for perfect and imperfect loop nests, but the problem of selecting effective tile sizes remains a major challenge. In this work, we develop a novel model-driven approach to guide rectangular tile size selection for both sequential and parallel programs. Our approach employs analytical models to limit empirical search within a subspace of the full search space.
Applications rarely exploit the peak performance on hardware systems. Programmers tune their applications to overcome limitations such as memory bandwidth, cache hierarchy and processor configurations. However, High Performance libraries such as BLAS, ATLAS, Intel MKL and IBM ESSL provide an efficient way to mitigate this problem for a restricted set of loop kernels. We introduce our preliminary work on dynamic pattern recognition. In this work, we try to map the user-written code to existing libraries as a means to improve the performance.