Home

Awesome

vqr: Fast Nonlinear Vector Quantile Regression

example workflow

This package provides the first scalable implementation of Vector Quantile Regression (VQR), ready for large real-world datasets. In addition, it provides a powerful extension which makes VQR non-linear in the covariates, via a learnable transformation. The package is easy to use via a familiar sklearn-style API.

Refer to our paper1 for further details about nonlinear VQR, and please cite our work if you use this package:

@article{rosenberg2022fast,
  title={Fast Nonlinear Vector Quantile Regression},
  author={Rosenberg, Aviv A and Vedula, Sanketh and Romano, Yaniv and Bronstein, Alex M},
  journal={arXiv preprint arXiv:2205.14977},
  year={2022}
}

Brief background and intuition

Quantile regression2 (QR) is a well-known method which estimates a conditional quantile of a target variable $\text{Y}$, given covariates $\mathbf{X}$. Since a distribution can be exactly specified in terms of its quantile function, estimating all conditional quantiles recovers the full conditional distribution. A major limitation of QR is that it deals with a scalar-valued target variable, while many important applications require estimation of vector-valued responses.

Vector quantiles extend the notion of quantiles to high-dimensional variables 3. Vector quantile regression (VQR) is the estimation of the conditional vector quantile function $Q_{\mathbf{Y}|\mathbf{X}}$ from samples drawn from $P_{(\mathbf{X},\mathbf{Y})}$, where $\mathbf{Y}$ is a $d$-dimensional target variable and $\mathbf{X}$ are $k$-dimensional covariates ("features").

VQR is a highly general approach, as it allows for assumption-free estimation of the conditional vector quantile function, which is a fundamental quantity that fully represents the distribuion of $\mathbf{Y}|\mathbf{X}$. Thus, VQR is applicable for any statistical inference task, i.e., it can be used to estimate any quantity corresponding to a distribution.

Below is an illustration of vector quantiles of a $d=2$-dimensional star-shaped distribution, where $T=50$ quantile levels were estimated in each dimension. fig1A

Results and Comparisons

Non-linear VQR

Nonlinear VQR (NL-VQR) outperformes linear VQR and Conditional VAE (C-VAE)4 on challenging distribution estimation tasks. The metric shown is KDE-L1 distribution distance (lower is better). Comparisons on two synthetic datasets are shown belows.

Conditional banana: In this dataset both the mean of the distribution and its shape change as a nonlinear function of the covariates $\text{X}$. cond-banana

Rotating stars: Features a nonlinear relationship between the covariates and the quantile function (a rotation matrix), where the conditional mean remains the same for any $\text{X}$, while only the tails (“lowest” and “highest”) quantiles change. stars-updated

Non-linear Scalar QR

The Nonlinear VQR implementation in this package can be used for performing scalar, i.e. $d=1$, quantile regression. It is very fast since it estimates all $T$ quantile levels simultaneously.

Synthetic glasses: A bi-modal distribution in which the modes' distance depends on $\text{X}$. Note that there are no quantile crossings even when the two modes overlap.

<img src="https://user-images.githubusercontent.com/75639/183285484-7efdeeae-c9f1-4be2-808d-1e48fde99478.png" width="50%">

Features

Installation

Simply install the vqr package via pip:

pip install vqr

To run the example notebooks, please clone this repo and install the supplied conda environment.

conda env update -f environment.yml -n vqr
conda activate vqr

Usage examples

Below is a minimal usage example for VQR, demonstrating fitting linear VQR, sampling from the conditional distribution, and calculating coverage at a specified $\alpha$.

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

from vqr import VectorQuantileRegressor
from vqr.solvers.regularized_lse import RegularizedDualVQRSolver

N, d, k, T = 5000, 2, 1, 20
N_test = N // 10
seed = 42
alpha = 0.05

# Generate some data (or load from elsewhere).
X, Y = make_regression(
    n_samples=N, n_features=k, n_targets=d, noise=0.1, random_state=seed
)
X_train, X_test, Y_train, Y_test = train_test_split(
    X, Y, test_size=N_test, shuffle=True, random_state=seed
)

# Create the VQR solver and regressor.
vqr_solver = RegularizedDualVQRSolver(
    verbose=True, epsilon=1e-2, num_epochs=1000, lr=0.9
)
vqr = VectorQuantileRegressor(n_levels=T, solver=vqr_solver)

# Fit the model on the data.
vqr.fit(X_train, Y_train)

# Marginal coverage calculation: for each test point, calculate the
# conditional quantiles given x, and check whether the corresponding y is covered
# in the alpha-contour.
cov_test = np.mean(
    [vqr.coverage(Y_test[[i]], X_test[[i]], alpha=alpha) for i in range(N_test)]
)
print(f"{cov_test=}")

# Sample from the fitted conditional distribution, given a specific x.
Y_sampled = vqr.sample(n=100, x=X_test[0])

# Calculate conditional coverage given a sample x.
cov_sampled = vqr.coverage(Y_sampled, x=X_test[0], alpha=alpha)
print(f"{cov_sampled=}")

For further examples, please fefer to the example notebooks in the notebooks/ folder of this repo.

References

Footnotes

  1. Rosenberg, A.A., Vedula, S., Romano, Y. and Bronstein, A.M., 2022. Fast Nonlinear Vector Quantile Regression. arXiv preprint arXiv:2205.14977.

  2. Koenker, R. and Bassett Jr, G., 1978. Regression quantiles. Econometrica: journal of the Econometric Society, pp.33-50.

  3. Carlier, G., Chernozhukov, V. and Galichon, A., 2016. Vector quantile regression: an optimal transport approach. The Annals of Statistics, 44(3), pp.1165-1192.

  4. Feldman, S., Bates, S. and Romano, Y., 2021. Calibrated multiple-output quantile regression with representation learning. arXiv preprint arXiv:2110.00816.