

Linuistic Adversity


This repository is an implementation of the following work:

Li, Yitong , Trevor Cohn and Timothy Baldwin (2017) Robust Training under Linguistic Adversity, In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), Valencia, Spain.

Respository Structure

In this repository, files are separated into three fold:

Details are following:


This fold contains the original dataset,

For more sentiment analysis dataset can be found at HarvardNLP.

Please cite the original paper when you use data.


This fold contains four different linguistic noise generators within four sub-folds each.

Please refer to the READMEfile for each noise generator methods to run them.


The fold contains the code of semantic noise generator based on Wordnet.

For running the wordnet noise genereator code, you need the following dependencies:


Based on idea of Counter-fitting



Based on English Resource Grammar (ERG) system, and ACE. Dependencies:


Based on sentence compression method.


A convolutional neural network model for text classification tasks. The model is based on YoonKim's Convolutional Neural Networks for Sentence Classification and Denny Britz's implementation (https://github.com/dennybritz/cnn-text-classification-tf). Notice that the code has been implemented and tested with Tensorflow r1.0 and python 2.7, which may not be able to run on other version.


For run the cnn code, you need the following dependencies:

Running the code

python train.py [parameters]
        The training dataset (default:"mr")
        Type of noise (default:"raw")
        To train on the noisy data (default:False)
        To test on the noisy data (default:False)
        Model L2 regularizaion lambda (default: 0)
        Model dropout rate (default:0.5)

For example, to train the model with settings:

nice python  train.py --dataset="mr" --is_noise_train=False --is_noise_test=False --noise_type="cf=0.5" --dropout_keep_prob=1.0

Also, refer to run_script_sample_on_subj.sh to get a better sense of training with different noise.

Contact us

Please email us if anything. All comments are welcome.