Home

Awesome

Contrastive Active Learning (CAL)

<!-- ## ⚠️WORK IN PROGRESS⚠️ -->

Active Learning by Acquiring Contrastive Examples
Katerina Margatina, Giorgos Vernikos, Loic Barrault, Nikolaos Aletras
Empirical Methods in Natural Language Processing (EMNLP) 2021.


Quick Overview

In our paper, we propose a new acquisition function for active learning namely CAL: Contrastive Active Learning. This repository contains code for running active learning with our proposed acquisition function, CAL, and other baselines.

<p align="center"> <img src="cal.png" width="500"> </p>

Acquisition functions

Specifically, there is code for running active learning with the following acquisition functions: CAL, Entropy, Least Confidence, BALD, BatchBALD, ALPS, BADGE, BertKM and Random sampling.

Tasks & Datasets

We evaluate the aforementioned AL algorithms in 4 Natural Lagnuage Processing (NLP) tasks and 7 datasets.

Models

So far we have used only BERT-BASE, but the code can support any other model (e.g. from HuggingFace) with minimal changes.


Installation

This project is implemented with Python 3, PyTorch 1.9.0 and transformers 3.1.0.

Create Environment (Optional): Ideally, you should create a conda environment for the project.

conda create -n cal python=3.7
conda activate cal

Also install the required torch package(*):

conda install pytorch==1.9.0 torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia
<!-- conda install pytorch==1.9.0 torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia ford -->

Finally install the rest of the requirements:

pip install -r requirements.txt

(*) Please check here for information on how to install properly the required pytorch version for your machine (cuda). This is important! Do not copy paste the above line without first checking which cuda version is supported by your machine. You can run nvcc --version in your terminal to check it.


Download data

In order to download the datasets we used run the following script:

bash get_data.sh

DBPedia is too large so dowload it manually from here.


Organization

The repository is organizes as follows:


Usage

The main script to run any AL experiment is run_al.py.

Example:

python run_al.py --dataset_name sst-2 --acquisition cal

Acknowledgements

We would like to thank the community for releasing their code! This repository contains code from HuggingFace, ALPS, and BatchBALD repositories.


Reference

Please feel free to cite our paper if you use our code or proposed algorithm.:blush:

@inproceedings{margatina-etal-2021-active,
    title = "Active Learning by Acquiring Contrastive Examples",
    author = {Margatina, Katerina  and
      Vernikos, Giorgos  and
      Barrault, Lo{\"\i}c  and
      Aletras, Nikolaos},
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.51",
    pages = "650--663",
    abstract = "Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting contrastive examples, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive Active Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.",
}

Contact

Please feel free to raise an issue or contact me in case you require any help setting up the repo!:blush: