Home

Awesome

Deep Batch Active Learning for Regression

Identifier

This repository contains code accompanying our paper "A Framework and Benchmark for Deep Batch Active Learning for Regression". It can be used for the following purposes:

If you use this code for research purposes, plese cite our paper.

Implemented methods

This repository contains an efficient implementation of our framework for building BMDAL algorithms for NN regression, which includes

Versions

License

This source code is licensed under the Apache 2.0 license. However, the implementation of the acs-rf-hyper kernel transformation in bmdal/features.py is adapted from the source code at https://github.com/rpinsler/active-bayesian-coresets, which comes with its own (non-commercial) license. Please respect this license when using the acs-rf-hyper transformation directly from bmdal/features.py or indirectly through the interface provided at bmdal/algorithms.py.

Installation

This code has been tested with Python 3.9.2 but may be compatible with versions down to Python 3.6.

Through pip

For running our NN and the active learning methods, a pip installation is sufficient. The library can be installed via

pip3 install bmdal_reg

When using our benchmarking code through a pip installation, the paths where experiment data and plots are saved can be modified through changing the corresponding path variables of bmdal_reg.custom_paths.CustomPaths before running the benchmark.

Manually

For certain purposes, especially trying new methods and running the benchmark, it might be helpful or necessary to modify the code. For this, the code can be manually installed via cloning the GitHub repository and then following the instructions below:

The following packages (available through pip) need to be installed:

If you want to install PyTorch with GPU support, please follow the instructions on the PyTorch website. The following command installs the versions of the libraries we used for running the benchmark, which however come with security warnings in the meantime:

pip3 install -r requirements_original.txt

Alternatively, the following command installs current versions of the packages:

pip3 install torch numpy dill psutil matplotlib seaborn pandas openml mat4py scipy

Clone the repository (or download the files from the repository) and change to its folder:

git clone git@github.com:dholzmueller/bmdal_reg.git
cd bmdal_reg

Then, copy the file bmdal_reg/custom_paths.py.default to bmdal_reg/custom_paths.py via

cp bmdal_reg/custom_paths.py.default bmdal_reg/custom_paths.py

and, if you want to, adjust the paths in custom_paths.py to specify the folders in which you want to save data and results.

Downloading data

If you want to use the benchmark data sets, you need to download and preprocess them. We do not provide preprocessed versions of the data sets to avoid copyright issues, but you can download and preprocess the data sets using

python3 download_data.py

Note that this may take a while. This depends of course on your download speed. The preprocessing is mostly fast, but for the (large) methane data set, it took around five minutes and 25 GB of RAM for us. If you cannot download/process the data due to limited RAM, please contact the main developer (see below).

Usage

Depending on your use case, some of the following introductory Jupyter notebooks may be helpful:

Besides these notebooks, you can also take a look at the code directly. The more important parts of our code are documented with docstrings.

Code structure

The code is structured as follows:

Updates to the second version of the benchmark

Updates to the third version of the benchmark

Contributors

If you would like to contribute to the code or would be interested in additional features, please contact David Holzmüller.