Awesome

Poly-encoders

This repository is an unofficial implementation of Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring.

How to use

Download and unzip the ubuntu data https://www.dropbox.com/s/2fdn26rj6h9bpvl/ubuntudata.zip?dl=0
Prepare a pretrained BERT (https://github.com/huggingface/transformers)
pip3 install -r requirements.txt

Train a Poly-encoder:

python3 train.py -bert_model /your/pretrained/model/dir --output_dir /your/ckpt/dir --train_dir /your/data/dir --use_pretrain --architecture poly --poly_m 16

Train a Bi-encoder:

python3 train.py -bert_model /your/pretrained/model/dir --output_dir /your/ckpt/dir --train_dir /your/data/dir --use_pretrain --architecture bi

Results

The experimental settings and results are shown as follows:

Dataset: Ubuntu
Device: GTX 1060 6G x1
Pretrained model: BERT-small-uncased (https://github.com/sfzhou5678/PretrainedLittleBERTs or https://storage.googleapis.com/bert_models/2020_02_20/all_bert_models.zip)
Batch size: 32
max_contexts_length: 128
max_context_cnt: 4
max_response_length：64
lr: 5e-5
Epochs: 3

Model	R@1/10	Training Speed	GPU Mem Consumption
Bi-encoder	0.6714	3.15it/s	1969 Mb
Poly-encoder 16	0.6938	3.11it/s	1975 Mb
Poly-encoder 64	0.7026	3.08it/s	2005 Mb
Poly-encoder 360	0.7066	3.05it/s	2071 Mb

Different with the original paper, this experiment uses a bert-small-uncased model (from https://github.com/sfzhou5678/PretrainedLittleBERTs or https://storage.googleapis.com/bert_models/2020_02_20/all_bert_models.zip) rather than the bert-base. Besides, this experiment only uses batch_size =32, max_length = 128, and max_history=4 (which means select up to 4 context texts). All these settings lead to lower results but faster training speed. One can modify these settings for a better result.

Some Improvements

Thanks to @chijames, this implementation is closer to the original paper and has achieved better performance.

BTW, If you have any suggestions or questions, please feel free to reach me out!