Home

Awesome

<img src="assets/babble_logo.png" width="150"/>

A Python implementation of Babble Labble, a framework for creating training data via natural language explanations.
Presented at NIPS 2017 (demo) and ACL 2018 (paper).

Getting Started

About Babble Labble

The main idea behind Babble Labble is that when annotators label training sets, there are reasons behind each label. With Babble Labble, we collect those reasons as natural language explanations, which are then converted via semantic parser into labeling functions, executable functions which can be used to automatically label additional data. When many such labeling functions are combined, training sets of sufficient size and quality can be generated to train classifiers with reasonable performance, despite utilizing only a small number of user inputs (e.g., tens of explanations instead of thousands of individual labels).

In the larger picture, we envision systems like Babble Labble serving as higher-level "supervision compilers" for the Software 2.0 systems of the future. Babble Labble is just one of many projects exploring how weak supervision sources can be used to train machine learning systems. Related works include:

You can find links to papers, repositories, and blog posts on the Snorkel landing page.

Disclaimer

The code in this repository is very much research code, a proof of concept. There are many ways it could be improved, optimized, made more user-friendly, etc. Unfortunately, we do not have the manpower to provide ongoing support and have no plans to publish further updates. However, the individual components of the framework are readily available in other applications with better ongoing support:

There's nothing special about our particular implementation of this pipeline; the power is in the combination of a tools that allows high-level inputs to be converted into weak supervision resources, and a way to use those resources to ultimately train a model. Since the interfaces between the components are all simply labels---a label matrix between the semantic parser/filter bank and label aggregator, and a set of training labels from the label aggregator to the discriminative model---the framework is fairly modular.

<!-- For example, the semantic parser could be replaced with some other model that can handles even higher-level concepts, such as a pre-trained QA model that users provide with questions related to their relation of interest (e.g., answering "who has a child with X?" should help with answering "who is married to X?"). -->

References

@article{hancock2018babble,
  title={Training Classifiers with Natural Language Explanations},
  author={Hancock, Braden and Varma, Paroma and Wang, Stephanie and Bringmann, Martin and Liang, Percy and R{\'e}, Christopher},
  booktitle = {Association for Computational Linguistics (ACL)},
  year={2018},
}

Hancock, B., Varma, P., Wang, S., Bringmann, M., Liang, P. and Ré, C. Training Classifiers with Natural Language Explanations. ACL 2018.

Setup

There are two ways to set up Babble Labble:

The first step for both options is the same:
[0] Read the Disclaimer

Steps 4 & 5 are identical as well.

Option A: Docker

[1] Install Docker (instructions)

[2] Pull docker image:

docker pull bhancock8/babble

[3] Run docker container

docker run --rm -i -p 8080:8080 -t bhancock8/babble /bin/bash

Skip to Step 4.

Option B: Local

[1] Install Anaconda 3.6 (instructions)

[2] Clone the repository:

git clone https://github.com/HazyResearch/babble.git
cd babble

[3] Set up environment:

conda env create -f environment.yml
source activate babble
source add_to_path.sh

Continue to Step 4.

Options A & B

[4] Run unit tests:

nosetests

If the tests run successfully, you will see an "OK" printed at the end.
If you chose Option B, the first time you run this may take extra time to install a language model for spaCy.

[5] Run the tutorial:

If you'd like to try out the tutorials, continue on to the Tutorial README.