Anna Youssefi, Exploring the Potential for Accelerating Sparse Matrix-Vector Multiplication on a Processing-in-Memory Architecture

As the relative importance of memory access delays on performance has mushroomed over the last few decades, researchers have begun exploring Processing-in-Memory (PIM) technology, which offers higher memory bandwidth, shorter memory latency, and lower power consumption. In this study, we investigate whether PIM can boost performance for sparse matrix-vector multiplication (SPMV). While SPMV is in the best-case bandwidth-bound, factors related to matrix structure and representation also limit performance. We analyze SPMV both in the context of the AMD Opteron processor and a PIM architecture design developed at Sandia National Laboratories, exploring the performance limiters for each and the degree to which these can be ameliorated by data and code transformations. Over a range of sparse matrices, SPMV on the PIM outperformed the Opteron by a factor of 1.82. On the PIM, computational kernel and data structure transformations improved performance by almost 40% over conventional implementations using compressed-sparse row format.