Awesome
Conformal prediction
Uncertainty quantification for distribution-free and data-agnostic problems is applied in terms of conformal prediction (CP) methodologies. In particular, this is applied to regression problems involving both exchangeable and time-series data. This work was carried out as part of (UB 2023) MSc thesis development.
Note the related media can be found at the author's corresponding repository, which deploys a GitHub page so that the thesis and the presentation are publicly accessible.
Exchangeable data
Toy problem
A toy problem is proposed, before dealing with more complex datasets, according to this Kaggle discussion.
Regression problem
The same dataset as the mapie
's CQR tutorial is proposed: the sklearn
built-in California Housing dataset.
Chosen in view of being simple and reproducible, in particular no feature engineering is needed; it is composed of 20,640 samples of the following 8 different features:
- The median income in block group
- The median house age in block group
- The average number of rooms per household
- The average number of bedrooms per household
- The block group population
- The average number of household members
- The location (latitude & longitude) of the block group
- The label variable: the median house price for a given block group.
Non-exchangeable data
Time series problem
The same dataset as the mapie
's time series tutorial was chosen: the Victoria electricity demand dataset, used in the book “Forecasting: Principles and Practice” [1].
It contains a total of 1340 samples and deals with an electricity demand forecasting problem: which not only features daily and weekly seasonality, but it is also impacted by temperature. Thus, apart from the electricty demand lagged up to 7 days (and other time features), temperature will be used as exogenous variable.
[1] Forecasting: principles and practice. Hyndman, R.J. and Athanasopoulos, G. ISBN: 9780987507105. 2014. OTexts. Link.