Awesome
NLP paper implementation relevant to classification with PyTorch
The papers were implemented in using korean corpus
Prelimnary & Usage
pyenv virualenv 3.7.7 nlp
pyenv activate nlp
pip install -r requirements.txt
python build_dataset.py
python build_vocab.py
python train.py # default training parameter
python evaluate.py # defatul evaluation parameter
Single sentence classification (sentiment classification task)
- Using the Naver sentiment movie corpus v1.0 (a.k.a.
nsmc
)
- Configuration
conf/model/{type}.json
(e.g. type = ["sencnn", "charcnn",...]
)
conf/dataset/nsmc.json
- Structure
# example: Convolutional_Neural_Networks_for_Sentence_Classification
├── build_dataset.py
├── build_vocab.py
├── conf
│ ├── dataset
│ │ └── nsmc.json
│ └── model
│ └── sencnn.json
├── evaluate.py
├── experiments
│ └── sencnn
│ └── epochs_5_batch_size_256_learning_rate_0.001
├── model
│ ├── data.py
│ ├── __init__.py
│ ├── metric.py
│ ├── net.py
│ ├── ops.py
│ ├── split.py
│ └── utils.py
├── nsmc
│ ├── ratings_test.txt
│ ├── ratings_train.txt
│ ├── test.txt
│ ├── train.txt
│ ├── validation.txt
│ └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy | Train (120,000) | Validation (30,000) | Test (50,000) | Date |
---|
SenCNN | 91.95% | 86.54% | 85.84% | 20/05/30 |
CharCNN | 86.29% | 81.69% | 81.38% | 20/05/30 |
ConvRec | 86.23% | 82.93% | 82.43% | 20/05/30 |
VDCNN | 86.59% | 84.29% | 84.10% | 20/05/30 |
SAN | 90.71% | 86.70% | 86.37% | 20/05/30 |
ETRIBERT | 91.12% | 89.24% | 88.98% | 20/05/30 |
SKTBERT | 92.20% | 89.08% | 88.96% | 20/05/30 |
Pairwise-text-classification (paraphrase detection task)
# example: Siamese_recurrent_architectures_for_learning_sentence_similarity
├── build_dataset.py
├── build_vocab.py
├── conf
│ ├── dataset
│ │ └── qpair.json
│ └── model
│ └── siam.json
├── evaluate.py
├── experiments
│ └── siam
│ └── epochs_5_batch_size_64_learning_rate_0.001
├── model
│ ├── data.py
│ ├── __init__.py
│ ├── metric.py
│ ├── net.py
│ ├── ops.py
│ ├── split.py
│ └── utils.py
├── qpair
│ ├── kor_pair_test.csv
│ ├── kor_pair_train.csv
│ ├── test.txt
│ ├── train.txt
│ ├── validation.txt
│ └── vocab.pkl
├── train.py
└── utils.py
Model \ Accuracy | Train (6,136) | Validation (682) | Test (758) | Date |
---|
Siam | 93.00% | 83.13% | 83.64% | 20/05/30 |
SAN | 89.47% | 82.11% | 81.53% | 20/05/30 |
Stochastic | 89.26% | 82.69% | 80.07% | 20/05/30 |
ETRIBERT | 95.07% | 94.42% | 94.06% | 20/05/30 |
SKTBERT | 95.43% | 92.52% | 93.93% | 20/05/30 |