Home

Awesome

Efficient columnwise correlation

Efficient ways to compute Pearson's correlation between columns of two matrices in numpy and other scientific computing languages.

See http://stackoverflow.com/questions/19401078/efficient-columnwise-correlation-coefficient-calculation-with-numpy for the initial discussion.

The numpy version is used in https://github.com/ikizhvatov/pysca.

Timings

Laptop: i7-5650U 2.2 GHz (dual-core), 8GB 1600 MHz DDR3, 512 GB PCIe SSD, Mac OS 10.13.3

Desktop: i7-4790K 4.0 GHz (quad-core), 32GB 1333 MHz DDR3, 250GB SATA SSD, Ubuntu 16.04. x64

On both machines, TurboBoost left on.

VersionLaptop, sDesktop, sRatio
numpy 1.14.21.630.822.0
julia 0.6.21.750.742.3
R 3.4.33326.61.2
MATLAB R2017a1.851.081.7

Python timings are given for Anaconda python 3.6.4; they are similar for Python 2.7, and for default python 2.7 with numpy on Mac OS. The optimize option of einsum leads to almost 10-fold increase in speed, bringing numpy on par with julia and MATLAB.

R timing degraded in 3.4.x compared to 3.3.3 (36 s vs 26 s), despite http://blog.revolutionanalytics.com/2017/02/preview-r-340.html.

Running the timings

Required for python: numpy

Required for R: Hmisc

Required for MATLAB: Statistics and Machine Learning Toolbox

python columnwise_corrcoef_perf.py

julia columnwise_corrcoef_perf.jl

Rscript columnwise_corrcoef_perf.r

/Applications/MATLAB_R2017a.app/bin/matlab -nojvm -nodisplay -nosplash -r "columnwise_corrcoef_perf; exit;"

For MATLAB, the example is given for Mac OS; path needs to be adjusted depending on your platform.

Notes