Awesome
Deep dynamic Contextualized word representation (DDCWR)
TensorFlow code and pre-trained models for DDCWR
Important explanation
- The method of the model is simple, only using the feed forward neural network with attention mechanism.
- Model training is fast, and only a few cycles can be used to train the model. The value of the initialization parameter comes from the BERT model of Google.
- The effect of the model is very good. In most cases, it is consistent with the current (2018-11-13) optimal model. Sometimes the effect is better. The optimal effect can be seen in gluebenchmark.
Thought of article
This model Deep_dynamic_word_representation(DDWR) combines the BERT model and ELMo's deep context word representation.
The BERT comes from BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding The ELMo comes from Deep contextualized word representations
Basic usage method
Download Pre-trained models
Doenload GLUE dataDATA
using this script
Sentence (and sentence-pair) classification tasks
difference
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue
python run_classifier_elmo.py \
--task_name=MRPC \
--do_train=true \
--do_eval=true \
--data_dir=$GLUE_DIR/MRPC \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=/tmp/mrpc_output/
Prediction from classifier
the same as https://github.com/google-research/bert
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue
export TRAINED_CLASSIFIER=/path/to/fine/tuned/classifier
python run_classifier_elmo.py \
--task_name=MRPC \
--do_predict=true \
--data_dir=$GLUE_DIR/MRPC \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$TRAINED_CLASSIFIER \
--max_seq_length=128 \
--output_dir=/tmp/mrpc_output/
more methods to google-research/bert
Solve SQUAD1.1 problem
the same as https://github.com/google-research/bert
difference
python run_squad_elmo.py --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --do_train=True --train_file=$SQUAD_DIR/train-v1.1.json --do_predict=True --predict_file=$SQUAD_DIR/dev-v1.1.json --train_batch_size=12 --learning_rate=3e-5 --num_train_epochs=2.0 --max_seq_length=384 --doc_stride=128 --output_dir=./tmp/elmo_squad_base/
Experimental Result
python run_squad_elmo.py
{“exact_match”: 81.20151371807, “f1”: 88.56178500169332}