Home

Awesome

Smaller Multilingual Transformers

This repository shares smaller versions of multilingual transformers that keep the same representations offered by the original ones. The idea came from a simple observation: after massively multilingual pretraining, not all embeddings are needed to perform finetuning and inference. In practice one would rarely require a model that supports more than 100 languages as the original mBERT. Therefore, we extracted several smaller versions that handle fewer languages. Since most of the parameters of multilingual transformers are located in the embeddings layer, our models are up to 64% smaller in size.

The table bellow compares two of our exracted versions with the original mBERT. It shows the models size, memory footprint and the obtained accuracy on the XNLI dataset (Cross-lingual Transfer from english for french). These measurements have been computed on a Google Cloud n1-standard-1 machine (1 vCPU, 3.75 GB).

ModelNum parametersSizeMemoryAccuracy
bert-base-multilingual-cased178 million714 MB1400 MB73.8
Geotrend/bert-base-15lang-cased141 million564 MB1098 MB74.1
Geotrend/bert-base-en-fr-cased112 million447 MB878 MB73.8

Reducing the size of multilingual transformers facilitates their deployment on public cloud platforms. For instance, Google Cloud Platform requires that the model size on disk should be lower than 500 MB for serveless deployments (Cloud Functions / Cloud ML).

For more information, please refer to our paper: Load What You Need.

*** New August 2021: smaller versions of distil-mBERT are now available ! ***

ModelNum parametersSizeMemory
distilbert-base-multilingual-cased178 million542 MB1200 MB
Geotrend/distilbert-base-en-fr-cased69 million277 MB740 MB

🚀 To our knowledge, these distil-mBERT based versions are the smallest and fastest multilingual transformers that we are aware of.

Available Models

Until now, we generated a total of 138 models (70 extracted from mBERT and 68 extracted from distil-mBERT). These models have been uploaded to the Hugging Face Model Hub in order to facilitate their use: https://huggingface.co/Geotrend.

They can be downloaded easily using the transformers library:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-fr-cased")
model = AutoModel.from_pretrained("Geotrend/bert-base-en-fr-cased")

More models will be released soon.

Generating new Models

We also share a python script that allows users to generate smaller transformers by their own based on a subset of the original vocabulary (the method does not only concern multilingual transformers):


pip install -r requirements.txt

python3 reduce_model.py \
	--source_model bert-base-multilingual-cased \
	--vocab_file vocab_5langs.txt \
	--output_model bert-base-5lang-cased \
	--convert_to_tf False

Where:

How to Cite

@inproceedings{smallermbert,
  title={Load What You Need: Smaller Versions of Multilingual BERT},
  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
  booktitle={SustaiNLP / EMNLP},
  year={2020}
}

Contact

Please contact amin.geotrend@gmail.com for any question, feedback or request.