Home

Awesome

Legal-HeBERT

Legal-HeBERT is a BERT model for Hebrew legal and legislative domains. It is intended to improve the legal NLP research and tools development in Hebrew. We release two versions of Legal-HeBERT. The first version is a fine-tuned model of HeBERT applied on legal and legislative documents. The second version uses HeBERT's architecture guidlines to train a BERT model from scratch. <br> We continue collecting legal data, examining different architectural designs, and performing tagged datasets and legal tasks for evaluating and to development of a Hebrew legal tools.

Training Data

Our training datasets are:

NameHebrew DescriptionSize (GB)DocumentsSentencesWordsNotes
The Israeli Law Bookספר החוקים הישראלי0.0523382933524851063
Judgments of the Supreme Courtמאגר פסקי הדין של בית המשפט העליון0.7212348579013879672415
custody courtsהחלטות בתי הדין למשמורת2.46169,7088,555,893213,050,492
Law memoranda, drafts of secondary legislation and drafts of support tests that have been distributed to the public for commentתזכירי חוק, טיוטות חקיקת משנה וטיוטות מבחני תמיכה שהופצו להערות הציבור0.43,291294,7527,218,960
Supervisors of Land Registration judgmentsמאגר פסקי דין של המפקחים על רישום המקרקעין0.0255967,6391,785,446
Decisions of the Labor Court - Coronaמאגר החלטות בית הדין לעניין שירות התעסוקה – קורונה0.001146350560195
Decisions of the Israel Lands Councilהחלטות מועצת מקרקעי ישראל11811283162692aggregate file
Judgments of the Disciplinary Tribunal and the Israel Police Appeals Tribunalפסקי דין של בית הדין למשמעת ובית הדין לערעורים של משטרת ישראל0.0254837241743419aggregate files
Disciplinary Appeals Committee in the Ministry of Healthועדת ערר לדין משמעתי במשרד הבריאות0.00425221010429807465 files are scanned and didn't parser
Attorney General's Positionsמאגר התייצבויות היועץ המשפטי לממשלה0.00828132724813877
Legal-Opinion of the Attorney Generalמאגר חוות דעת היועץ המשפטי לממשלה0.002447132188053
total3.665389,13915,161,152309,976,419

We thank <b>Yair Gardin</b> for the referring to the governance data, <b>Elhanan Schwarts</b> for collecting and parsing The Israeli law book, and <b>Jonathan Schler</b> for collecting the judgments of the supreme court.

Training process

Additional training settings:

<b>Fine-tuned HeBERT model:</b> The first eight layers were freezed (like Lee et al. (2019) suggest)<br> <b>Legal-HeBERT trained from scratch:</b> The training process is similar to HeBERT and inspired by Chalkidis et al. (2020) <br>

How to use

The models can be found in huggingface hub and can be fine-tunned to any down-stream task:

# !pip install transformers==4.14.1
from transformers import AutoTokenizer, AutoModel

model_name = 'avichr/Legal-heBERT_ft' # for the fine-tuned HeBERT model 
model_name = 'avichr/Legal-heBERT' # for legal HeBERT model trained from scratch

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

from transformers import pipeline
fill_mask = pipeline(
    "fill-mask",
    model=model_name,
)
fill_mask("הקורונה לקחה את [MASK] ולנו לא נשאר דבר.")

Stay tuned!

We are still working on our models and the datasets. We will edit this page as we progress. We are open for collaborations.

If you used this model please cite us as :

Chriqui, Avihay, Yahav, Inbal and Bar-Siman-Tov, Ittai, Legal HeBERT: A BERT-based NLP Model for Hebrew Legal, Judicial and Legislative Texts (June 27, 2022). Available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4147127

@article{chriqui2021hebert,
  title={Legal HeBERT: A BERT-based NLP Model for Hebrew Legal, Judicial and Legislative Texts},
  author={Chriqui, Avihay, Yahav, Inbal and Bar-Siman-Tov, Ittai},
  journal={SSRN preprint:4147127},
  year={2022}
}

Contact us

Avichay Chriqui, The Coller AI Lab <br> Inbal yahav, The Coller AI Lab <br> Ittai Bar-Siman-Tov, the BIU Innovation Lab for Law, Data-Science and Digital Ethics <br>

Thank you, תודה, شكرا <br>