Zekai Jacob Gao, On the Top-k Entries of a Large Correlation Matrix

Correlations are widely used in state-of-the-art machine learning techniques. In big data era, datasets with millions of points and thousands of dimensions become very common, making it costly to calculate the correlation matrix based on all the data points. In this talk, I'm going to show how our sampling based model can help to efficiently estimate the correlation matrix, and in particular, the top-k entries when sparsity is expected.