Awesome
Status | Coverage | Docs |
---|---|---|
CovarianceEstimation.jl
Lightweight robust covariance estimation in Julia i.e. if you have a data matrix X
of size n×p
corresponding to n
observations with p
features, this package will help you to obtain an estimator of the covariance matrix of size p×p
associated with this data.
Note: if you are interested in covariance estimation in the context of a linear regression, consider for now the package CovarianceMatrices.jl which focuses around that case.
Quick start
using CovarianceEstimation
X = randn(5, 7)
S_uncorrected = cov(SimpleCovariance(), X)
S_corrected = cov(SimpleCovariance(corrected=true), X)
# using linear shrinkage with different targets
LSE = LinearShrinkage
# - Ledoit-Wolf target + shrinkage
method = LSE(ConstantCorrelation())
S_ledoitwolf = cov(method, X)
# - Chen target + shrinkage (using the more verbose call)
method = LSE(target=DiagonalCommonVariance(), shrinkage=:rblw)
S_chen_rblw = cov(method, X)
method = LSE(target=DiagonalCommonVariance(), shrinkage=:oas)
S_chen_oas = cov(method, X)
# a pre-defined shrinkage can be used as well
method = LinearShrinkage(DiagonalUnitVariance(), 0.5)
# using a given shrinkage
S_05 = cov(method, X)
Currently supported algorithms
In this section, X
is the data matrix of size n × p
, S
is the sample covariance matrix with S = κ (Xc' * Xc)
where κ
is either n
(uncorrected) or n-1
(corrected) and Xc
is the centred data matrix (see docs).
SimpleCovariance
: basic corrected or uncorrected sample covariance (implemented inStatsBase.jl
).
Time complexity: O(p²n)
with a low constant
Sample covariance based methods
These methods build an estimator of the covariance derived from S
. They are implemented using abstract covariance estimation interface from StatsBase.jl
.
LinearShrinkage
: James-Stein type estimator of the form(1-λ)S+λF
whereF
is a target andλ∈[0,1]
a shrinkage intensity.- common targets are implemented following the taxonomy given in [1] along with Ledoit-Wolf optimal shrinkage intensities [2].
- in the case of the
DiagonalCommonVariance
target, a Rao-Blackwellised Ledoit-Wolf shrinkage (:rblw
) and Oracle-Approximating shrinkage (:oas
) are also supported (see [3]). - Note:
S
is symmetric semi-positive definite so that if theF
is symmetric positive definite and providedλ
is non-zero, the estimator obtained after shrinkage is also symmetric positive definite. For the diagonal targetsDiagonalUnitVariance
,DiagonalCommonVariance
andDiagonalUnequalVariance
the target is necessarily SPD.
AnalyticalNonlinearShrinkage
: estimator of the formMΛM'
whereM
andΛ
are matrices derived from the eigen decomposition ofS
.[4]WoodburyEstimator
: estimator of the formσ²I + U * Λ * U'
, whereI
is the identity matrix,U
is low rank semi-orthogonal, andΛ
is diagonal. This form is well-suited to very high-dimensional problems.[6]BiweightMidcovariance
: robust estimator, described more in the documentation.[5]
Time complexity:
- Linear shrinkage:
O(p²n)
with a low constant (main cost is formingS
) - Nonlinear shrinkage:
- if
p<n
:O(p²n + n²)
with a moderate constant (main cost is formingS
and manipulating a matrix ofn²
elements) - if
p>n
:O(p³)
with a low constant (main cost is computing the eigen decomposition ofS
).
- if
- Biweight midcovariance:
O(p²n)
Other estimators (coming)
These are estimators that may be implemented in the future, see also this review paper.
- Sparsity based estimators for either the covariance or the precision
- Rank based approaches
- POET
- HAC
- ...
For HAC (and other estimators of covariance of coefficient of regression models) you can currently use the CovarianceMatrices.jl package.
Comparison to existing libraries
Rough benchmarks are run over random matrices of various sizes (40x20, 20x40, 400x200, 200x400
).
These benchmarks should (as usual) be taken with a pinch of salt but essentially a significant speedup should be expected for a standard problem.
- Sklearn (implements
DiagonalCommonVariance
target withoas
andlw
shrinkage)- average speedup:
5x
- average speedup:
- Corpcor (implements
DiagonalUnequalVariance
target withss
shrinkage)- average speedup:
22x
- average speedup:
- Ledoit-Wolf 1 (implements
ConstantCorrelation
target withlw
shrinkage, we used Octave for the comparison)- average speedup:
12x
- average speedup:
- Ledoit-Wolf 2 (implements
AnalyticalNonlinearShrinkage
)- average speedup:
25x
- average speedup:
- Astropy (implements
BiweightMidcovariance
)- average speedup:
3x
- average speedup:
References
- [1] J. Schäfer and K. Strimmer, A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics, Statistical Applications in Genetics and Molecular Biology, 2005.
- [2] O. Ledoit and M. Wolf, Honey, I Shrunk the Sample Covariance Matrix, The Journal of Portfolio Management, 2004.
- [3] Y. Chen, A. Wiesel, Y. C. Eldar, and A. O. Hero, Shrinkage Algorithms for MMSE Covariance Estimation, IEEE Transactions on Signal Processing, 2010.
- [4] O. Ledoit and M. Wolf, Analytical Nonlinear Shrinkage of Large-Dimensional Covariance Matrices, Working Paper, 2018.
- [5] Beers, Flynn, and Gebhardt (1990; AJ 100, 32) "Measures of Location and Scale for Velocities in Clusters of Galaxies – A Robust Approach"
- [6] Donoho, D.L., Gavish, M. and Johnstone, I.M., 2018. Optimal shrinkage of eigenvalues in the spiked covariance model. Annals of Statistics, 46(4), p.1742.