Home

Awesome

NLP paper implementation relevant to classification with PyTorch

The papers were implemented in using korean corpus

Prelimnary & Usage

pyenv virualenv 3.7.7 nlp
pyenv activate nlp
pip install -r requirements.txt
python build_dataset.py
python build_vocab.py
python train.py # default training parameter
python evaluate.py # defatul evaluation parameter

Single sentence classification (sentiment classification task)

# example: Convolutional_Neural_Networks_for_Sentence_Classification
├── build_dataset.py
├── build_vocab.py
├── conf
│   ├── dataset
│   │   └── nsmc.json
│   └── model
│       └── sencnn.json
├── evaluate.py
├── experiments
│   └── sencnn
│       └── epochs_5_batch_size_256_learning_rate_0.001
├── model
│   ├── data.py
│   ├── __init__.py
│   ├── metric.py
│   ├── net.py
│   ├── ops.py
│   ├── split.py
│   └── utils.py
├── nsmc
│   ├── ratings_test.txt
│   ├── ratings_train.txt
│   ├── test.txt
│   ├── train.txt
│   ├── validation.txt
│   └── vocab.pkl
├── train.py
└── utils.py
Model \ AccuracyTrain (120,000)Validation (30,000)Test (50,000)Date
SenCNN91.95%86.54%85.84%20/05/30
CharCNN86.29%81.69%81.38%20/05/30
ConvRec86.23%82.93%82.43%20/05/30
VDCNN86.59%84.29%84.10%20/05/30
SAN90.71%86.70%86.37%20/05/30
ETRIBERT91.12%89.24%88.98%20/05/30
SKTBERT92.20%89.08%88.96%20/05/30

Pairwise-text-classification (paraphrase detection task)

# example: Siamese_recurrent_architectures_for_learning_sentence_similarity
├── build_dataset.py
├── build_vocab.py
├── conf
│   ├── dataset
│   │   └── qpair.json
│   └── model
│       └── siam.json
├── evaluate.py
├── experiments
│   └── siam
│       └── epochs_5_batch_size_64_learning_rate_0.001
├── model
│   ├── data.py
│   ├── __init__.py
│   ├── metric.py
│   ├── net.py
│   ├── ops.py
│   ├── split.py
│   └── utils.py
├── qpair
│   ├── kor_pair_test.csv
│   ├── kor_pair_train.csv
│   ├── test.txt
│   ├── train.txt
│   ├── validation.txt
│   └── vocab.pkl
├── train.py
└── utils.py
Model \ AccuracyTrain (6,136)Validation (682)Test (758)Date
Siam93.00%83.13%83.64%20/05/30
SAN89.47%82.11%81.53%20/05/30
Stochastic89.26%82.69%80.07%20/05/30
ETRIBERT95.07%94.42%94.06%20/05/30
SKTBERT95.43%92.52%93.93%20/05/30