Home

Awesome

<img src="https://github.com/feedly/transfer-nlp/blob/v0.1/data/images/TransferNLP_Logo.jpg" width="1000">

Welcome to the Transfer NLP library, a framework built on top of PyTorch to promote reproducible experimentation and Transfer Learning in NLP

You can have an overview of the high-level API on this Colab Notebook, which shows how to use the framework on several examples. All DL-based examples on these notebooks embed in-cell Tensorboard training monitoring!

For an example of pre-trained model finetuning, we provide a short executable tutorial on BertClassifier finetuning on this Colab Notebook

Set up your environment

mkvirtualenv transfernlp
workon transfernlp

git clone https://github.com/feedly/transfer-nlp.git
cd transfer-nlp
pip install -r requirements.txt

To use Transfer NLP as a library:

# to install the experiment builder only
pip install transfernlp
# to install Transfer NLP with PyTorch and Transfer Learning in NLP support
pip install transfernlp[torch]

or

pip install git+https://github.com/feedly/transfer-nlp.git

to get the latest state before new releases.

To use Transfer NLP with associated examples:

git clone https://github.com/feedly/transfer-nlp.git
pip install -r requirements.txt

Documentation

API documentation and an overview of the library can be found here

Reproducible Experiment Manager

The core of the library is made of an experiment builder: you define the different objects that your experiment needs, and the configuration loader builds them in a nice way. For reproducible research and easy ablation studies, the library then enforces the use of configuration files for experiments. As people have different tastes for what constitutes a good experiment file, the library allows for experiments defined in several formats:

In Transfer-NLP, an experiment config file contains all the necessary information to define entirely the experiment. This is where you will insert names of the different components your experiment will use, along with the hyperparameters you want to use. Transfer-NLP makes use of the Inversion of Control pattern, which allows you to define any class / method / function you could need, the ExperimentConfig class will create a dictionnary and instatiate your objects accordingly.

To use your own classes inside Transfer-NLP, you need to register them using the @register_plugin decorator. Instead of using a different registry for each kind of component (Models, Data loaders, Vectorizers, Optimizers, ...), only a single registry is used here, in order to enforce total customization.

If you use Transfer NLP as a dev dependency only, you might want to use it declaratively only, and call register_plugin() on objects you want to use at experiment running time.

Here is an example of how you can define an experiment in a YAML file:

data_loader:
  _name: MyDataLoader
  data_parameter: foo
  data_vectorizer:
    _name: MyVectorizer
    vectorizer_parameter: bar

model:
  _name: MyModel
  model_hyper_param: 100
  data: $data_loader

trainer:
  _name: MyTrainer
  model: $model
  data: $data_loader
  loss:
    _name: PyTorchLoss
  tensorboard_logs: $HOME/path/to/tensorboard/logs
  metrics:
    accuracy:
      _name: Accuracy

Any object can be defined through a class, method or function, given a _name parameters followed by its own parameters. Experiments are then loaded and instantiated using ExperimentConfig(experiment=experiment_path_or_dict)

Some considerations:

You can have a look at the tests for examples of experiment settings the config loader can build. Additionally we provide runnable experiments in experiments/.

Transfer Learning in NLP: flexible PyTorch Trainers

For deep learning experiments, we provide a BaseIgniteTrainer in transfer_nlp.plugins.trainers.py. This basic trainer will take a model and some data as input, and run a whole training pipeline. We make use of the PyTorch-Ignite library to monitor events during training (logging some metrics, manipulating learning rates, checkpointing models, etc...). Tensorboard logs are also included as an option, you will have to specify a tensorboard_logs simple parameters path in the config file. Then just run tensorboard --logdir=path/to/logs in a terminal and you can monitor your experiment while it's training! Tensorboard comes with very nice utilities to keep track of the norms of your model weights, histograms, distributions, visualizing embeddings, etc so we really recommend using it.

<img src="https://github.com/feedly/transfer-nlp/blob/v0.1/data/images/tensorboard.png" width="1000">

We provide a SingleTaskTrainer class which you can use for any supervised setting dealing with one task. We are working on a MultiTaskTrainer class to deal with multi task settings, and a SingleTaskFineTuner for large models finetuning settings.

Use cases

Here are a few use cases for Transfer NLP:

Slack integration

While experimenting with your own models / data, the training might take some time. To get notified when your training finishes or crashes, you can use the simple library knockknock by folks at HuggingFace, which add a simple decorator to your running function to notify you via Slack, E-mail, etc.

Some objectives to reach:

Acknowledgment

The library has been inspired by the reading of <cite>"Natural Language Processing with PyTorch"<cite> by Delip Rao and Brian McMahan. Experiments in experiments, the Vocabulary building block and embeddings nearest neighbors are taken or adapted from the code provided in the book.