Awesome

Download the Imdb dataset

./download_dataset.sh

Download the glove vector embeddings (used by the model)

 ./download_glove.sh

Download the counter-fitted vectors (used by our attack)

./download_counterfitted_vectors.sh

Build the vocabulary and embeddings matrix.

python build_embeddings.py

That will take like a minute, and it will tokenize the dataset and save it to a pickle file. It will also compute some auxiliary files like the matrix of the vector embeddings for words in our dictionary. All files will be saved under aux_files directory created by this script.

Train the sentiment analysis model.

python train_model.py

6)Download the Google language model.

./download_googlm.sh

Pre-compute the distances between embeddings of different words (required to do the attack) and save the distance matrix.

python compute_dist_mat.py

Now, we are ready to try some attacks ! You can do so by running the IMDB_AttackDemo.ipynb Jupyter notebook !

Attacking Textual Entailment model

The model we are using for our experiment is the SNLI model of Keras SNLI Model .

First, Download the dataset using

bash download_snli_data.sh

Download the Glove and Counter-fitted Glove embedding vectors

bash ./download_glove.sh
bash ./download_counterfitted_vectors.sh

Train the NLI model

python sni_rnn.py

Pre-compute the embedding matrix

python nli_compute_dist_matrix.py

Now, you are ready to run the attack using example code provided in NLI_AttackDemo.ipynb Jupyter notebook.