Awesome
<br /> <p align="center"> <h1 align="center">Trans-Encoder</h1> <h3 align="center"> </h3> <p align="center"> <a href="https://arxiv.org/abs/2109.13059">[arxiv]</a> · <a href="https://www.amazon.science/blog/improving-unsupervised-sentence-pair-comparison">[amazon.science blog]</a> · <a href="https://iclr.cc/virtual/2022/poster/6242">[5min-video]</a> · <a href="https://youtu.be/1Zg0rmVNfFI">[talk@RIKEN]</a> · <a href="https://openreview.net/forum?id=AmUhwTOHgm">[openreview]</a> </p> </p> <img align="right" width="500" src="https://production-media.paperswithcode.com/methods/e6c08315-2b70-4125-aeb2-147a6785d9b1.png">Code repo for ICLR 2022 paper Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations <br> by Fangyu Liu, Yunlong Jiao, Jordan Massiah, Emine Yilmaz, Serhii Havrylov.
Trans-Encoder is a state-of-the-art unsupervised sentence similarity model. It conducts self-knowledge-distillation on top of pretrained language models by alternating between their bi- and cross-encoder forms.
Huggingface pretrained models for STS
<table> <tr><th> base models </th><th> large models </th></tr> <tr><td>model | STS avg. |
---|---|
baseline: unsup-simcse-bert-base | 76.21 |
trans-encoder-bi-simcse-bert-base | 80.41 |
trans-encoder-cross-simcse-bert-base | 79.90 |
baseline: unsup-simcse-roberta-base | 76.10 |
trans-encoder-bi-simcse-roberta-base | 80.47 |
trans-encoder-cross-simcse-roberta-base | 81.15 |
model | STS avg. |
---|---|
baseline: unsup-simcse-bert-large | 78.42 |
trans-encoder-bi-simcse-bert-large | 82.65 |
trans-encoder-cross-simcse-bert-large | 82.52 |
baseline: unsup-simcse-roberta-large | 78.92 |
trans-encoder-bi-simcse-roberta-large | 82.93 |
trans-encoder-cross-simcse-roberta-large | 82.93 |
Dependencies
torch==1.8.1
transformers==4.9.0
sentence-transformers==2.0.0
Please view requirements.txt for more details.
Data
All training and evaluation data will be automatically downloaded when running the scripts. See src/data.py for details.
Train
--task
options: sts
(STS2012-2016 and STS-b), sickr
, sts_sickr
(STS2012-2016, STS-b, and SICK-R), qqp
, qnli
, mrpc
, snli
, custom
. See src/data.py for task data details. By default using all STS data (sts_sickr
).
Self-distillation
>> bash train_self_distill.sh 0
0
denotes GPU device index.
Mutual-distillation
>> bash train_mutual_distill.sh 0,1
Two GPUs needed; by default using SimCSE BERT & RoBERTa base models for ensembling. Add --use_large
for switching to large models.
Train with your custom corpus
>> CUDA_VISIBLE_DEVICES=0,1 python src/mutual_distill_parallel.py \
--batch_size_bi_encoder 128 \
--batch_size_cross_encoder 64 \
--num_epochs_bi_encoder 10 \
--num_epochs_cross_encoder 1 \
--cycle 3 \
--bi_encoder1_pooling_mode cls \
--bi_encoder2_pooling_mode cls \
--init_with_new_models \
--task custom \
--random_seed 2021 \
--custom_corpus_path CORPUS_PATH
CORPUS_PATH
should point to your custom corpus in which every line should be a sentence pair in the form of sent1||sent2
.
Evaluate
Evaluate a single model
Bi-encoder:
>> python src/eval.py \
--model_name_or_path "cambridgeltl/trans-encoder-bi-simcse-roberta-large" \
--mode bi \
--task sts_sickr
Cross-encoder:
>> python src/eval.py \
--model_name_or_path "cambridgeltl/trans-encoder-cross-simcse-roberta-large" \
--mode cross \
--task sts_sickr
Evaluate ensemble
Bi-encoder:
>> python src/eval.py \
--model_name_or_path1 "cambridgeltl/trans-encoder-bi-simcse-bert-large" \
--model_name_or_path2 "cambridgeltl/trans-encoder-bi-simcse-roberta-large" \
--mode bi \
--ensemble \
--task sts_sickr
Cross-encoder:
>> python src/eval.py \
--model_name_or_path1 "cambridgeltl/trans-encoder-cross-simcse-bert-large" \
--model_name_or_path2 "cambridgeltl/trans-encoder-cross-simcse-roberta-large" \
--mode cross \
--ensemble \
--task sts_sickr
Authors
- Fangyu Liu: Main contributor
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.