Home

Awesome

Poly-encoders

This repository is an unofficial implementation of Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring.

How to use

  1. Download and unzip the ubuntu data https://www.dropbox.com/s/2fdn26rj6h9bpvl/ubuntudata.zip?dl=0

  2. Prepare a pretrained BERT (https://github.com/huggingface/transformers)

  3. pip3 install -r requirements.txt

  4. Train a Poly-encoder:

    python3 train.py -bert_model /your/pretrained/model/dir --output_dir /your/ckpt/dir --train_dir /your/data/dir --use_pretrain --architecture poly --poly_m 16
    
  5. Train a Bi-encoder:

    python3 train.py -bert_model /your/pretrained/model/dir --output_dir /your/ckpt/dir --train_dir /your/data/dir --use_pretrain --architecture bi
    

Results

The experimental settings and results are shown as follows:

ModelR@1/10Training SpeedGPU Mem Consumption
Bi-encoder0.67143.15it/s1969 Mb
Poly-encoder 160.69383.11it/s1975 Mb
Poly-encoder 640.70263.08it/s2005 Mb
Poly-encoder 3600.70663.05it/s2071 Mb

Different with the original paper, this experiment uses a bert-small-uncased model (from https://github.com/sfzhou5678/PretrainedLittleBERTs or https://storage.googleapis.com/bert_models/2020_02_20/all_bert_models.zip) rather than the bert-base. Besides, this experiment only uses batch_size =32, max_length = 128, and max_history=4 (which means select up to 4 context texts). All these settings lead to lower results but faster training speed. One can modify these settings for a better result.

Some Improvements

BTW, If you have any suggestions or questions, please feel free to reach me out!