Home

Awesome

Open in Dev Containers Open in GitHub Codespaces

👖 Conformal Tights

Conformal Tights is a Python package that exports:

Features

  1. 🍬 Sklearn meta-estimator: add conformal prediction of quantiles and intervals to any scikit-learn regressor
  2. 🔮 Darts forecaster: add conformally calibrated probabilistic forecasting to any scikit-learn regressor
  3. 🌡️ Conformally calibrated: accurate quantiles, and intervals with reliable coverage
  4. 🚦 Coherent quantiles: quantiles increase monotonically instead of crossing each other
  5. 👖 Tight quantiles: selects the lowest dispersion that provides the desired coverage
  6. 🎁 Data efficient: requires only a small number of calibration examples to fit
  7. 🐼 Pandas support: optionally predict on DataFrames and receive DataFrame output

Using

Quick links

  1. Installing
  2. Predicting quantiles
  3. Predicting intervals
  4. Forecasting time series

Installing

pip install conformal-tights

Predicting quantiles

Conformal Tights exports a meta-estimator called ConformalCoherentQuantileRegressor that you can use to equip any scikit-learn regressor with a predict_quantiles method that predicts conformally calibrated quantiles. Example usage:

from conformal_tights import ConformalCoherentQuantileRegressor
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor

# Fetch dataset and split in train and test
X, y = fetch_openml("ames_housing", version=1, return_X_y=True, as_frame=True, parser="auto")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)

# Create a regressor, equip it with conformal prediction, and fit on the train set
my_regressor = XGBRegressor(objective="reg:absoluteerror")
conformal_predictor = ConformalCoherentQuantileRegressor(estimator=my_regressor)
conformal_predictor.fit(X_train, y_train)

# Predict with the underlying regressor
ŷ_test = conformal_predictor.predict(X_test)

# Predict quantiles with the conformal predictor
ŷ_test_quantiles = conformal_predictor.predict_quantiles(
    X_test, quantiles=(0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975)
)

When the input data is a pandas DataFrame, the output is also a pandas DataFrame. For example, printing the head of ŷ_test_quantiles yields:

house_id0.0250.050.10.50.90.950.975
1357114743.7120917.9131752.6156708.2175907.8187996.1205443.4
236767382.780191.786871.8105807.1118465.3127581.2142419.1
2822119068.0131864.8138541.6159447.7179227.2197337.0214134.1
212693885.8100040.7111345.5134292.7150557.1164595.8182524.1
154468959.881648.888364.1108298.3122329.6132421.1147225.6

Let's visualize the predicted quantiles on the test set:

<img src="https://github.com/superlinear-ai/conformal-tights/assets/4543654/2726d108-ee84-47d0-83d9-7e911b123f0c"> <details> <summary>Expand to see the code that generated the graph above</summary>
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

%config InlineBackend.figure_format = "retina"
plt.rc("font", family="DejaVu Sans", size=10)
plt.figure(figsize=(8, 4.5))
idx = ŷ_test_quantiles[0.5].sample(50, random_state=42).sort_values().index
x = list(range(1, len(idx) + 1))
x_ticks = [1, *list(range(5, len(idx) + 1, 5))]
for j in range(3):
    coverage = round(100 * (ŷ_test_quantiles.columns[-(j + 1)] - ŷ_test_quantiles.columns[j]))
    plt.bar(
        x,
        ŷ_test_quantiles.loc[idx].iloc[:, -(j + 1)] - ŷ_test_quantiles.loc[idx].iloc[:, j],
        bottom=ŷ_test_quantiles.loc[idx].iloc[:, j],
        color=["#b3d9ff", "#86bfff", "#4da6ff"][j],
        label=f"{coverage}% Prediction interval",
    )
plt.plot(
    x,
    y_test.loc[idx],
    "s",
    label="Actual (test)",
    markeredgecolor="#e74c3c",
    markeredgewidth=1.414,
    markerfacecolor="none",
    markersize=4,
)
plt.plot(x, ŷ_test.loc[idx], "s", color="blue", label="Predicted (test)", markersize=2)
plt.xlabel("House")
plt.xticks(x_ticks, x_ticks)
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f"${x/1000:,.0f}k"))
plt.gca().tick_params(axis="both", labelsize=10)
plt.gca().spines["top"].set_visible(False)
plt.gca().spines["right"].set_visible(False)
plt.grid(False)
plt.grid(axis="y")
plt.legend(loc="upper left", title="House price", title_fontproperties={"weight": "bold"})
plt.tight_layout()
</details>

Predicting intervals

In addition to quantile prediction, you can use predict_interval to predict conformally calibrated prediction intervals. Compared to quantiles, these focus on reliable coverage over quantile accuracy. Example usage:

# Predict an interval for each example with the conformal predictor
ŷ_test_interval = conformal_predictor.predict_interval(X_test, coverage=0.95)

# Measure the coverage of the prediction intervals on the test set
coverage = ((ŷ_test_interval.iloc[:, 0] <= y_test) & (y_test <= ŷ_test_interval.iloc[:, 1])).mean()
print(coverage)  # 96.6%

When the input data is a pandas DataFrame, the output is also a pandas DataFrame. For example, printing the head of ŷ_test_interval yields:

house_id0.0250.975
1357107202.8206290.4
236766665.1146004.8
2822115591.8220314.8
212685288.1183037.8
154467889.9150646.2

Forecasting time series

Conformal Tights also exports a Darts forecaster called DartsForecaster that uses a ConformalCoherentQuantileRegressor to make conformally calibrated probabilistic time series forecasts. To demonstrate its usage, let's begin by loading a time series dataset:

from darts.datasets import ElectricityConsumptionZurichDataset

# Load a forecasting dataset
ts = ElectricityConsumptionZurichDataset().load()
ts = ts.resample("h")

# Split the dataset into covariates X and target y
X = ts.drop_columns(["Value_NE5", "Value_NE7"])
y = ts["Value_NE5"]  # NE5 = Household energy consumption

# Add categorical covariates to X
X = X.add_holidays(country_code="CH")
X = X.add_datetime_attribute("month")
X = X.add_datetime_attribute("dayofweek")
X = X.add_datetime_attribute("hour")
X_categoricals = ["holidays", "month", "dayofweek", "hour"]

Printing the tail of the covariates time series X.pd_dataframe() yields:

TimestampHr [%Hr]RainDur [min]StrGlo [W/m2]T [°C]WD [°]WVs [m/s]WVv [m/s]p [hPa]holidaysmonthdayofweekhour
2022‑08‑30 20h70.20.00.019.9290.21.71.5968.50.07.01.020.0
2022‑08‑30 21h70.10.00.019.5239.21.00.7968.10.07.01.021.0
2022‑08‑30 22h71.30.00.019.528.91.51.3967.90.07.01.022.0
2022‑08‑30 23h80.40.00.018.924.31.61.1967.90.07.01.023.0
2022‑08‑31 00h81.61.00.018.7293.50.90.3967.80.07.02.00.0

We can now equip a scikit-learn regressor with conformal prediction using ConformalCoherentQuantileRegressor as before, and then equip that conformal predictor with probabilistic time series forecasting using DartsForecaster:

from conformal_tights import DartsForecaster, ConformalCoherentQuantileRegressor
from pandas import Timestamp
from xgboost import XGBRegressor

# Split the dataset into train and test
test_cutoff = Timestamp("2022-06-01")
y_train, y_test = y.split_after(test_cutoff)
X_train, X_test = X.split_after(test_cutoff)

# Now let's:
# 1. Create an sklearn regressor of our choosing, in this case `XGBRegressor`
# 2. Add conformal quantile prediction to the regressor with `ConformalCoherentQuantileRegressor`
# 3. Add probabilistic forecasting to the conformal predictor with `DartsForecaster`
my_regressor = XGBRegressor()
conformal_predictor = ConformalCoherentQuantileRegressor(estimator=my_regressor)
forecaster = DartsForecaster(
    model=conformal_predictor,
    lags=5 * 24,  # Add the last 5 days of the target to the prediction features
    lags_future_covariates=[0],  # Add the current timestamp's covariates to the prediction features
    categorical_future_covariates=X_categoricals,  # Convert these covariates to pd.Categorical
)

# Fit the forecaster
forecaster.fit(y_train, future_covariates=X_train)

# Make a probabilistic forecast 5 days into the future by predicting a set of conformally calibrated
# quantiles at each time step and drawing 500 samples from them
quantiles = (0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.975)
forecast = forecaster.predict(
    n=5 * 24, future_covariates=X_test, num_samples=500, quantiles=quantiles
)

Printing the head of the forecast quantiles time series forecast.quantiles_df(quantiles=quantiles) yields:

TimestampValue_NE5_0.025Value_NE5_0.05Value_NE5_0.1Value_NE5_0.25Value_NE5_0.5Value_NE5_0.75Value_NE5_0.9Value_NE5_0.95Value_NE5_0.975
2022‑06‑01 01h19165.219268.319435.719663.019861.720062.220237.920337.720453.2
2022‑06‑01 02h19004.019099.019226.319453.719710.719966.120170.120272.820366.9
2022‑06‑01 03h19372.619493.019679.420027.620324.620546.320773.220910.321014.1
2022‑06‑01 04h21936.222105.622436.022917.523308.623604.823871.024121.724351.5
2022‑06‑01 05h25040.525330.525531.125910.426439.426903.227287.427493.927633.9

Let's visualize the forecast and its prediction interval on the test set:

<img src="https://github.com/superlinear-ai/conformal-tights/assets/4543654/8c3c256f-0732-49c7-94f2-e42213e85e4b"> <details> <summary>Expand to see the code that generated the graph above</summary>
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

%config InlineBackend.figure_format = "retina"
plt.rc("font", family="DejaVu Sans", size=10)
plt.figure(figsize=(8, 4.5))
y_train[-2 * 24 :].plot(label="Actual (train)")
y_test[: len(forecast)].plot(label="Actual (test)")
forecast.plot(label="Forecast with\n90% Prediction interval", low_quantile=0.05, high_quantile=0.95)
plt.gca().set_xlabel("")
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f"{x/1000:,.0f} MWh"))
plt.gca().tick_params(axis="both", labelsize=10)
plt.legend(loc="upper right", title="Energy consumption", title_fontproperties={"weight": "bold"})
plt.tight_layout()
</details>

Contributing

<details> <summary>Prerequisites</summary> <details> <summary>1. Set up Git to use SSH</summary>
  1. Generate an SSH key and add the SSH key to your GitHub account.

  2. Configure SSH to automatically load your SSH keys:

    cat << EOF >> ~/.ssh/config
    
    Host *
      AddKeysToAgent yes
      IgnoreUnknown UseKeychain
      UseKeychain yes
      ForwardAgent yes
    EOF
    
</details> <details> <summary>2. Install Docker</summary>
  1. Install Docker Desktop.
</details> <details> <summary>3. Install VS Code or PyCharm</summary>
  1. Install VS Code and VS Code's Dev Containers extension. Alternatively, install PyCharm.
  2. Optional: install a Nerd Font such as FiraCode Nerd Font and configure VS Code or configure PyCharm to use it.
</details> </details> <details open> <summary>Development environments</summary>

The following development environments are supported:

  1. ⭐️ GitHub Codespaces: click on Code and select Create codespace to start a Dev Container with GitHub Codespaces.
  2. ⭐️ Dev Container (with container volume): click on Open in Dev Containers to clone this repository in a container volume and create a Dev Container with VS Code.
  3. Dev Container: clone this repository, open it with VS Code, and run <kbd>Ctrl/⌘</kbd> + <kbd></kbd> + <kbd>P</kbd>Dev Containers: Reopen in Container.
  4. PyCharm: clone this repository, open it with PyCharm, and configure Docker Compose as a remote interpreter with the dev service.
  5. Terminal: clone this repository, open it with your terminal, and run docker compose up --detach dev to start a Dev Container in the background, and then run docker compose exec dev zsh to open a shell prompt in the Dev Container.
</details> <details> <summary>Developing</summary> </details>