Home

Awesome

GenAug: Data Augmentation for Finetuning Text Generators

Code for GenAug, presented in GenAug: Data Augmentation for Finetuning Text Generators published at EMNLP 2020 DeeLIO Workshop. You can cite it as follows:

@inproceedings{feng-etal-2020-genaug,
    title = "{G}en{A}ug: Data Augmentation for Finetuning Text Generators",
    author = "Feng, Steven Y. and Gangal, Varun and Kang, Dongyeop and Mitamura, Teruko and Hovy, Eduard",
    booktitle = "Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures",
    month = nov, year = "2020", address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.deelio-1.4",
    doi = "10.18653/v1/2020.deelio-1.4", pages = "29--42",
}

Authors: <a href="https://scholar.google.ca/citations?hl=en&user=zwiszZIAAAAJ">Steven Y. Feng</a>, <a href="https://scholar.google.com/citations?user=rWZq2nQAAAAJ&hl=en">Varun Gangal</a>, <a href="https://scholar.google.com/citations?user=fMKZOjwAAAAJ&hl=en">Dongyeop Kang</a>, <a href="https://scholar.google.com/citations?user=gjsxBCkAAAAJ&hl=en">Teruko Mitamura</a>, <a href="https://scholar.google.com/citations?user=PUFxrroAAAAJ&hl=en">Eduard Hovy</a>

Talk can be found here. Slides and other resources can be found here.

Note: inquiries should be directed to stevenyfeng@gmail.com or by opening an issue here.

<img src="GenAug_chart.png" alt="drawing" width="400"/>

Required Resources

  1. Stanford POS Tagger: https://nlp.stanford.edu/software/stanford-postagger-2018-10-16.zip
  2. Stanford CoreNLP: http://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip

Pretrained/finetuned Models (on Yelp):

  1. BERT Sentiment Regressor (Finetuned on YLR reviews with star ratings): https://drive.google.com/drive/folders/1JT07ZPxmMO9my5hH3MvJf8VmAlzuuUGf?usp=sharing
  2. GPT-2 (Finetuned on 2 million Yelp reviews - for perplexity and SLOR evaluation): https://drive.google.com/drive/folders/1J3Jcw-qtdWxCYZV7LOnLjKVJZPXiFS2h?usp=sharing
  3. SMERTI-Transformer (Trained on a subset of YLR): https://drive.google.com/drive/folders/1A-jyNp5So4lmv3ZtgKmwq7be8CoFtF_B?usp=sharing

Data

Code

SMERTI Augmentation Method Code

Code for the SMERTI augmentation method can be found in the "GenAug SMERTI-Transformer" folder at this repo. This is the official repo for "SMERTI for Semantic Text Exchange" presented in Keep Calm and Switch On! Preserving Sentiment and Fluency in Semantic Text Exchange published at EMNLP-IJCNLP 2019.

Note: more details and example commands for all the code will be added at a later date.