Home

Awesome

Bayesian Batch Active Learning as Sparse Subset Approximation

This repository contains the code to reproduce the experiments carried out in Bayesian Batch Active Learning as Sparse Subset Approximation.

The code has been authored by: Robert Pinsler and Jonathan Gordon.

Dependencies and Data Requirements

This code requires the following:

To run the regression experiments, please download the UCI regression datasets and place them into ./data.

GPU Requirements

Usage

The experiments provided in this code base include active learning on standard vision based datasets (classification) and UCI datasets (regression). The following experiments are provided (see section 7 of the paper):

  1. Active learning for regression: run the following command

    ./scripts/run_active_regression.sh DATASET ACQ CORESET

    where

    • DATASET may be one of {yacht, boston, energy, power, year} (determines the dataset to be used).
    • ACQ may be one of {BALD, Entropy, ACS} (determines the acquisition function to be used).
    • CORESET may be one of {Argmax, Random, Best, FW} (determines the querying strategy to be used)

    For example, to run the proposed method on the boston dataset, please run:

    ./scripts/run_active_regression.sh boston ACS FW

    This will automatically generate an experimental directory with the appropriate name, and place results from 40 seeds in the directory. Hyper-parameters for the experiments can all be found in the main body of the paper.

  2. Active learning for regression (with projections -- should be used for large datasets e.g., year and power): run the following command

    ./scripts/run_active_regression_projections.sh DATASET NUM_PROJECTIONS

    where

    • DATASET may be one of {yacht, boston, power, protein, year} (determines the dataset to be used).
    • NUM_PROJECTIONS is an integer (determines the number of samples used to estimate values).

    For example, to run the proposed method on the year dataset, please run:

    ./scripts/run_active_regression.sh year 10

    This will automatically generate an experimental directory with the appropriate name, and place results from 40 seeds in the directory. Hyper-parameters for the experiments can all be found in the main body of the paper.

  3. Active learning for classification (using standard active learning methods): run the following command

    ./scripts/run_active_torchvision.sh ACQ CORESET DATASET

    where

    • ACQ may be one of {BALD, Entropy} (determines the acquisition function to be used).
    • CORESET may be one of {Argmax, Random, Best} (determines the querying strategy to be used)
    • DATASET may be one of {cifar10, svhn, fashion_mnist} (determines the dataset to be used).

    For example, to run greedy BALD on CIFAR10, run the following command:

    ./scripts/run_active_torchvision.sh BALD Argmax cifar10

    This will automaticall generate an experimental directory with an appropriate name, and place results from 5 runs in the directory.

  4. Active learning for classification (using projections as in section 5 of the paper): run the following command

    ./scripts/run_active_torchvision_projections.sh CORESET DATASET

    where -CORESET may be one of {Argmax, Random, Best, FW} (determines the querying strategy to be used) -DATASET may be one of {cifar10, svhn, fashion_mnist} (determines the dataset to be used).

    For example, to run the proposed method on the CIFAR10 dataset, please run:

    ./scripts/run_active_torchvision_projections.sh FW cifar10

    This will automaticall generate an experimental directory with an appropriate name, and place results from 5 runs in the directory.

Plotting

Code to generate active learning curves as exhibited in the paper is also provided. To generate appropriate learning curves, please run the command

python3 ./scripts/enjoy_learning_curves.py --load_dir=LOAD_DIR --metric=METRIC --eval_at=EVAL_AT --format=FORMAT

where

Running this script will automatically generate a figure of the learning curve in the directory

Citation

If you use this code, please cite our paper:

@article{pinsler2019bayesian,
  title={Bayesian Batch Active Learning as Sparse Subset Approximation},
  author={Pinsler, Robert and Gordon, Jonathan and Nalisnick, Eric and Hern{\'a}ndez-Lobato, Jos{\'e} Miguel},
  journal={arXiv preprint arXiv:https://arxiv.org/abs/1908.02144},
  year={2019}
}

General Structure of the Repositiory