Awesome
<p> <img align="left" width="110" height="120" src="weasel.jpg"> </p> <div align="center">WeaSEL: Weakly Supervised End-to-end Learning
<a href="https://pytorch.org/get-started/locally/"><img alt="Python" src="https://img.shields.io/badge/-Python 3.7--3.9-blue?style=for-the-badge&logo=python&logoColor=white"></a> <a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/-PyTorch 1.7+-ee4c2c?style=for-the-badge&logo=pytorch&logoColor=white"></a> <a href="https://pytorchlightning.ai/"><img alt="Lightning" src="https://img.shields.io/badge/-Lightning-792ee5?style=for-the-badge&logo=pytorchlightning&logoColor=white"></a> <a href="https://hydra.cc/"><img alt="Config: hydra" src="https://img.shields.io/badge/config-hydra-89b8cd?style=for-the-badge&labelColor=gray"></a>
This is a PyTorch-Lightning-based framework, based on our End-to-End Weak Supervision paper (NeurIPS 2021), that allows you to train your favorite neural network for weakly-supervised classification<sup>1</sup>
</div>- only with multiple labeling functions (LFs)<sup>2</sup>, i.e. without any labeled training data!
- in an end-to-end manner, i.e. directly train and evaluate your neural net (end-model from here on), there's no need to train a separate label model any more as in Snorkel & co,
- with better test set performance and enhanced robustness against correlated or inaccurate LFs than prior methods like Snorkel
<sup>1</sup> This includes learning from crowdsourced labels or annotations! <br> <sup>2</sup> LFs are labeling heuristics, that output noisy labels for (subsets of) the training data (e.g. crowdworkers or keyword detectors).
If you use this code, please consider citing our work
<details><p> <summary><b> Credits</b></summary>End-to-End Weak Supervision
Salva Rühling Cachay, Benedikt Boecking, and Artur Dubrawski
Advances in Neural Information Processing Systems (NeurIPS), 2021
arXiv:2107.02233v3
-
The following template was extremely useful as source of inspiration and for getting started with the PL+Hydra implementation: ashleve/lightning-hydra-template
-
Weasel image credits go to Rohan Chang for this Unsplash-licensed image
Getting Started
This library assumes familiarity with (multi-source) weak supervision, if that's not the case you may want to first learn its basics in e.g. this overview slides from Stanford or this Snorkel tutorial.
That being said, have a look at our examples and the notebooks therein showing you how to use Weasel for your own dataset, LF set, or end-model. E.g.:
-
A high-level starter tutorial, with few code, many explanations and including Snorkel as a baseline (so that if you are familiar with Snorkel you can see the similarities and differences to Weasel).
-
See how the whole WeaSEL pipeline works with all details, necessary steps and definitions for a new dataset & custom end-model. This notebook will probably make you learn the most about WeaSEL and how to apply it to your own problem.
-
A realistic ML experiment script with all that's part of a ML pipeline, including logging to Weight&Biases, arbitrary callbacks, and eventually retrieving your fully trained end-model.
Reproducibility
Please have a look at the research code branch, which operates on pure PyTorch.
Installation
<details> <summary><b>1. New environment </b>(recommended, but optional)</summary>conda create --name weasel python=3.9
conda activate weasel
</details>
<details>
<summary><b> 2a: From source</b></summary>
python -m pip install git+https://github.com/autonlab/weasel#egg=weasel[all]
</details>
<details>
<summary><b> 2b: From source, <a href="https://huggingface.co/transformers/installation.html#editable-install">editable install</a></b></summary>
git clone https://github.com/autonlab/weasel.git
cd weasel
pip install -e .[all]
</details>
<details><p>
<summary><b>Minimal dependencies</b></summary>
Minimal dependencies, in particular not using Hydra, can be installed with
python -m pip install git+https://github.com/autonlab/weasel
The needed environment corresponds to conda env create -f env_gpu_minimal.yml
.
If you choose to use this variant, you won't be able to run some of the examples: You may want to have a look at this notebook that walks you through how to use Weasel without Hydra as the config manager.
</p></details>Note: Weasel is under active development, some uncovered edge cases might exist, and any feedback is very welcomed!
Apply WeaSEL to your own problem
Configuration with Hydra
Optional: This template config will help you get started with your own application, an analogous config is used in this tutorial script that you may want to check out.
Pre-defined or custom downstream models & Baselines
Please have a look at the detailed instructions in this Readme.
Using your own dataset and/or labeling heuristics
Please have a look at the detailed instructions in this Readme.
Citation
@article{cachay2021endtoend,
author={R{\"u}hling Cachay, Salva and Boecking, Benedikt and Dubrawski, Artur},
journal={Advances in Neural Information Processing Systems},
title={End-to-End Weak Supervision},
year={2021}
}