Home

Awesome

KoBERT-KorQuAD

Dependencies

How to use KoBERT on Huggingface Transformers Library

from transformers import BertModel
from tokenization_kobert import KoBertTokenizer

model = BertModel.from_pretrained('monologg/kobert')
tokenizer = KoBertTokenizer.from_pretrained('monologg/kobert')

Usage

코드의 경우 Huggingface Transformers의 example 코드를 가져와 사용하였습니다.

1. Training

$ python3 run_squad.py --model_type kobert \
                       --model_name_or_path monologg/kobert \
                       --output_dir models \
                       --data_dir data \
                       --train_file KorQuAD_v1.0_train.json \
                       --predict_file KorQuAD_v1.0_dev.json \
                       --evaluate_during_training \
                       --per_gpu_train_batch_size 8 \
                       --per_gpu_eval_batch_size 8 \
                       --max_seq_length 512 \
                       --logging_steps 4000 \
                       --save_steps 4000 \
                       --do_train

2. Evaluation

$ python3 evaluate_v1_0.py {$data_dir}/KorQuAD_v1.0_dev.json {$output_dir}/predictions_.json

Results

Exact Match (%)F1 Score (%)
KoBERT52.8180.27
DistilKoBERT54.1277.80
Bert-multilingual70.4290.25
DistilBert-multilingual64.3284.78

References