Home

Awesome

Multi-EURLEX

MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

This is the code used for the experiments described in the following paper:

I. Chalkidis, M. Fergadiotis, and I. Androutsopoulos, "MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer". Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 2021 (xxx)

Requirements:

Quick start:

Install python requirements:

pip install -r requirements.txt

Download dataset (MultiEURLEX):

The dataset is hosted and been described in detail in the Hugging Face Datasets (https://huggingface.co/datasets/multi_eurlex). It is automatically downloaded and used by the Trainer. If you want to review and familiarize your self with the dataset, you can download it usingthe following Python code:

from datasets import load_dataset
dataset = load_dataset('multi_eurlex', languages='all_languages')

Train a model:

The following configuration (command-line) arguments can be used:

You can run experiments by simply calling:

python trainer.py --bert_path 'xlm-roberta-base' --use_adapters True --train_lang 'en' --label_level 'level_1'

Credits

Thanks to @Essex97 for pointing out minor bugs in the codebase.