Home

Awesome

FastGECToR

1. Introduction

A faster and simpler implementation of GECToR – Grammatical Error Correction: Tag, Not Rewrite with amp and distributed support by deepspeed.

Note: To make it faster and more readable, we remove allennlp dependencies and reconstruct related codes.

2. Requirements

  1. Install Pytorch with cuda support pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  2. Install NVIDIA-Apex with CUDA and C++ extensions
  3. Install the rest packages with pip install -r ./requirements.txt

3. Data Processing

  1. Tokenize your data (one sentence per line, split words by space)
  2. Generate edits from parallel sents bash scripts/prepare_data.sh
  3. (Optional) Define your own target vocab (data/vocabulary/labels.txt)

4. Configuration

5. Training

bash scripts/train.sh

* Performance Tuning

6. Inference

bash scripts/predict.sh

Reference

[1] Omelianchuk, K., Atrasevych, V., Chernodub, A., & Skurzhanskyi, O. (2020). GECToR – Grammatical Error Correction: Tag, Not Rewrite. arXiv:2005.12592 [cs]. http://arxiv.org/abs/2005.12592