Awesome
ContinualMT
Quick Links
- Introduction
- Dataset
- Architecture
- Installation
- Preparing and Preprocessing
- Domain Incremental Training
- Extending for New Approaches
- Reference
Introduction
We introduce ContinualMT, an adaptable continual learning framework tailored for neural machine translation (NMT). It is crafted to promote the research of continual learning (CL) within the realm of NMT.
Our repository encompasses a PyTorch implementation of a suite of state-of-the-art (SoTA) methods, all adhering to a unified training and evaluation protocol. Presently, the supported methods include:
-
Some widely employed baselines for continual learning:
- Seq-FT: Continual finetuning of a sequence of domains, without any specific attention paid to the issues of forgetting or transfer.
- ONE: Individually finetuning pretrained NMT model for each domain.
- Adapter-ONE: Adds adapter to finetuning pretrained NMT model for each domain
- KD: Naive knoweldge Distillation
- EWC: Overcoming catastrophic forgetting in neural networks, Kirkpatrick et al., PNAS 2017
-
Recent proposed methods for NMT continual learning:
- Dynamic-KD: Continual learning for neural machine translation, Cao et al., NAACL 2021
- PTE: Pruning-then-expanding model for domain adaptation of neural machine translation, Gu et al., NAACL 2021
- F-MALLOC: F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation, accepted by NAACL 2024
We are actively working on implementing other methods and adding them to this framework!
Dataset
Currently, our framework focus on the multi-stage domain incremental training of NMT system. Within our framework, you can utilize machine translation data from various domains for domain incremental training. We offer a representative multi-domain machine translation dataset: OPUS multi-domains dataset. This dataset comprises German-English parallel data across five domains: Medical, Law, IT, Koran, and Subtitle. The dataset can be found here.
Architecture
Our implementation is built upon fairseq, with the following modifications:
./approaches
: code for supported continual learning approaches
./cl_scripts
: bash scripts for continual training
./cl_scripts_slurm
: slurm scripts for continual training
./lcheckpoints
: all training checkpoints are saved in this folder
./logs
: training logs
./pretrained_models
: folder for pretrained NMT models
./task_sequence
: reference sequences for OPUS multi-domain MT data
Installation
Firstly, build the enviorment from the provied YAML file.
conda env create --name CLMT --file CLMT.yaml
Install fairseq (pip install --editable .
), moses and fastBPE.
Preparing and Preprocessing
Pre-trained Model
Download the pre-trained WMT19 German-English model from fairseq, along with the dictionaries and the bpecodes.
Data
Firstly, navigate to the data folder by running cd ./examples/translation
. Ensure that you have set the paths for the Moses scripts, fastBPE, the model dictionaries, and the BPE codes in the scripts.
For general domain MT data, you can simply execute the provided preprocessing script prepare-wmt17de2en.sh
, which automatically download and prepare the data.
For domain incremental training data, download the mult-domain data and unzip it. Then process each domain with the prepare-domain-adapt.sh
script.
Finally, use the preprocess.sh
script to prepare the binary files for fairseq.
Domain Incremental Training
We offer training bash scripts for all supported approaches in ./cl_scripts
and ./cl_scripts_slurm
. For more detailed information, please refer to the individual readme files located in each directory.
Extending for New Approaches
Expanding our framework is straightforward. To integrate new CL approaches, you simply need to make modifications within the ./approaches
, ./cl_scripts
and ./cl_scripts_slurm
directories.
Reference
We highly appreciate your act of staring and citing. Your attention to detail and recognition is greatly valued.
@misc{wu2024fmalloc,
title={F-MALLOC: Feed-forward Memory Allocation for Continual Learning in Neural Machine Translation},
author={Junhong Wu and Yuchen Liu and Chengqing Zong},
year={2024},
eprint={2404.04846},
archivePrefix={arXiv},
primaryClass={cs.CL}
}