Awesome
Conformal Approach To Gaussian Process Surrogate Evaluation With Coverage Guarantees
What is done in this repo
đź”— Requirements
Python 3.7+
OpenTURNS is a C++ library made, hence one can need to install gcc to be able to run the library
Ubuntu:
$ sudo apt update
$ sudo apt install build-essential
OSX:
$ brew install gcc
Windows: Install MinGW (a Windows distribution of gcc) or Microsoft’s Visual C
Install the required packages:
- Via
pip
:
$ pip install -r requirements.txt
- Via conda:
$ conda install -f environment.yml
đź› Installation
Clone the repo and run the following command in the conformalized_gp directory to install the code
$ pip install .
⚡️ Quickstart
Here is a @quickstart to use the Jackknife+GP method on any regression dataset. Here, the goal is the compare visually the results given by the standard Jackknife+ method, the Credibility Intervals and our methodology. The notebook from which this quickstart is inspired can be found here
We first start to import the necessary packages
import matplotlib.pyplot as plt
import numpy as np
import scipy
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.model_selection import train_test_split
from mapie.conformity_scores.residual_conformity_scores import GPConformityScore
from mapie.regression import MapieRegressor
BLUE = np.array([[26, 54, 105]]) / 255
ORANGE = np.array([[223, 84, 49]]) / 255
YELLOW = np.array([[242, 188, 64]]) / 255
- In this example, we are going to work on an analytical function of our imagination which have some good visual behavior :
$$g(x) = 3x\sin(x) - 2x\cos(x) + \frac{x^3}{40} - \frac{x^2}{2} - 10x$$
def g(x):
return (3 * x * np.sin(x) - 2 * x * np.cos(x) + ( x ** 3) / 40 - .5 * x ** 2 - 10 * x)
x_mesh = np.linspace(-40, 60, 5000)
plt.plot(x_mesh, g(x_mesh))
plt.xlabel("$x$")
plt.ylabel("$g(x)$")
- Then we split our data into train and test and train au sickit-learn
GaussianProcessRegressor
with aRBF
kernel.
X_train, X_test, y_train, y_test = train_test_split(x_mesh, g(x_mesh), test_size=.98, random_state=42)
X_train = X_train.reshape(-1, 1)
X_test = X_test.reshape(-1, 1)
gp = GaussianProcessRegressor(normalize_y=True)
gp.fit(X_train, y_train)
- We then define and train the two conformal methods (J+ and J+GP):
mapie_j_plus_gp = MapieRegressor(
estimator=gp,
cv=-1,
method="plus",
conformity_score=GPConformityScore(),
model_has_std=True,
random_state=42
)
mapie_j_plus = MapieRegressor(
estimator=gp,
cv=-1,
method="plus",
conformity_score=None,
model_has_std=False,
random_state=42
)
mapie_j_plus_gp.fit(X_train, y_train)
mapie_j_plus.fit(X_train, y_train)
- Finally, we predict and compute prediction intervals with a confidence level of 90% on the test set and plot the prediction intervals of the three methods
ALPHA = .1
_, y_pss_j_plus_gp = mapie_j_plus_gp.predict(x_mesh.reshape(-1, 1), alpha=ALPHA)
_, y_pss_j_plus = mapie_j_plus.predict(x_mesh.reshape(-1, 1), alpha=ALPHA)
y_mean, y_std = gp.predict(x_mesh.reshape(-1, 1), return_std=True)
q_alpha_min = scipy.stats.norm.ppf(ALPHA / 2)
q_alpha_max = scipy.stats.norm.ppf(1 - ALPHA / 2)
f, ax = plt.subplots(1, 1, figsize=(20, 10))
ax.scatter(X_train, y_train, c=BLUE)
ax.plot(x_mesh, g(x_mesh), c=BLUE)
ax.plot(x_mesh, y_mean, c=YELLOW)
ax.fill_between(
x_mesh,
y_mean + y_std * q_alpha_min,
y_mean + y_std * q_alpha_max,
alpha=0.3,
color=YELLOW,
label=r"$\pm$ 1 std. dev.",
)
ax.fill_between(
x_mesh,
y_pss_j_plus_gp[:, 0, 0],
y_pss_j_plus_gp[:, 1, 0],
alpha=.6,
color=ORANGE,
label=r"$\pm$ 1 std. dev.",
)
ax.fill_between(
x_mesh,
y_pss_j_plus[:, 0, 0],
y_pss_j_plus[:, 1, 0],
alpha=.3,
color="g",
label=r"$\pm$ 1 std. dev.",
)
ax.legend(
[
"Training Points",
"True function", "Mean of posterior GP",
"Posterior GP Credibility Interval",
"Prediction Interval J+GP",
"Prediction Interval J+",
]
)
ax.set_xlabel("$x$")
ax.set_ylabel("$g(x)$")
🔌 Plug OpenTURNS GP into MAPIE
If you wish to use our code with an OpenTURNS model, we have implemented a simple wrapper around the model so that it can be used very easily:
from wrappers import GpOTtoSklearnStd
nu = 5/2 # Hyperparameter of the Matérn Kernel
noise = None # Standard deviation of the nugget effect. If None, no nugget effect is applied.
gp_estimator = GpOTtoSklearnStd(scale=1, amplitude=1, nu=nu, noise=None)
This estimator is now fully compatible with MAPIE as it comes with it .fit
and .predict
methods.