Awesome
Data preprocessing for Sentence Classification
This is a full implementation of data preprocessing for a CNN and a b-LSTM. Some of the codes are based on Amit Mandelbaum's code.
With this code you can reproduce almost all results presented on baselines and the results we present in our paper.
Requirements
- Python (2.7)
- NumPy
- NLTK
- Pandas
Download Google's word embeddings binary file from https://code.google.com/p/word2vec/ extract it, and place it under data/
folder
For most dataset, it could be downloaded from https://github.com/harvardnlp/sent-conv-torch/tree/master/data