Home

Awesome

Awesome License

Legal Natural Language Processing

๐Ÿ—‚ Datasets

<ins>Legal Judgement Prediction</ins> (LJP)

DatasetLinksDomainLanguageSize
FSCS (Niklaus et al., 2021)๐Ÿ“„ ๐Ÿค— ๐Ÿ’ปSwiss court judgments๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡น85K cases w/ 2 outcomes
ECtHR (Chalkidis et al., 2021)๐Ÿ“„ ๐Ÿค—EU court judgments๐Ÿ‡ฌ๐Ÿ‡ง11K cases w/ 11 outcomes
ECHR (Aletras et al., 2019)๐Ÿ“„ ๐Ÿ’พEU court judgments๐Ÿ‡ฌ๐Ÿ‡ง11.5K cases w/ 11 outcomes
CAIL (Xiao et al., 2018)๐Ÿ“„ ๐Ÿ’ปChinese court judgements๐Ÿ‡จ๐Ÿ‡ณ2.6M cases w/ 6 outcomes

<ins>Legal Text Classification</ins> (LTC)

DatasetLinksDomainLanguageSize
GLC (Papaloukas et al., 2021)๐Ÿ“„ ๐Ÿค— ๐Ÿ’ปGreek legislation๐Ÿ‡ฌ๐Ÿ‡ท47.5K laws w/ 2.7K labels
CUAD (Hendrycks et al., 2021)๐Ÿ“„ ๐Ÿค— ๐Ÿ’ปContracts๐Ÿ‡ฌ๐Ÿ‡ง510 contracts w/ 41 classes
MultiEURLEX (Chalkidis et al., 2021)๐Ÿ“„ ๐Ÿค— ๐Ÿ’ปEU legislation๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡น ๐Ÿ‡ช๐Ÿ‡ธ (18+)65K laws w/ 4.5K labels
LEDGAR (Tuggener et al., 2020)๐Ÿ“„ ๐Ÿ’พContracts๐Ÿ‡ฌ๐Ÿ‡ง60.5K contracts w/ 12.6K labels
Contract Discovery (Borchmann et al., 2020)๐Ÿ“„ ๐Ÿ’ปContracts๐Ÿ‡ฌ๐Ÿ‡ง2.6K clauses w/ 21 classes
EURLEX-57K (Chalkidis et al., 2019)๐Ÿ“„ ๐Ÿ’พEU legislation๐Ÿ‡ฌ๐Ÿ‡ง57K laws w/ 4.3K labels
Unfair-ToS (Lippi et al., 2018)๐Ÿ“„ ๐Ÿ’พContracts๐Ÿ‡ฌ๐Ÿ‡ง9.4K sentences w/ 9 classes
Contract Elements (Chalkidis et al., 2017)๐Ÿ“„ ๐Ÿ’พContracts๐Ÿ‡ฌ๐Ÿ‡ง2.4K contracts w/ 10 classes
OPP-115 (Wilson et al., 2016)๐Ÿ“„ ๐Ÿ’พPrivacy laws๐Ÿ‡ฌ๐Ÿ‡ง115 policies w/ 23K labels

<ins>Legal Information Retrieval</ins> (LIR)

DatasetLinksDomainLanguageSize
BSARD (Louis et al., 2022)๐Ÿ“„ ๐Ÿค— ๐Ÿ’ปBelgian legislation๐Ÿ‡ซ๐Ÿ‡ท1.1K questions w/ 22.6K candidate statutory articles
EU2UK (Chalkidis et al., 2021)๐Ÿ“„ ๐Ÿ’พEU & UK legislation๐Ÿ‡ฌ๐Ÿ‡ง2K query documents w/ 52.5K candidate documents
UK2EU (Chalkidis et al., 2021)๐Ÿ“„ ๐Ÿ’พEU & UK legislation๐Ÿ‡ฌ๐Ÿ‡ง2.1K query documents w/ 3.9K candidate documents
COLIEE-Case-Law-Retrieval (Rabelo et al., 2020)๐Ÿ“„ ๐Ÿ’พCanadian precedents๐Ÿ‡ฌ๐Ÿ‡ง650 query cases w/ 128K candidate cases
COLIEE-Statute-Law-Retrieval (Rabelo et al., 2020)๐Ÿ“„ ๐Ÿ’พJapanese legislation๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฏ๐Ÿ‡ต808 questions w/ 768 candidate statutory articles
CAIL2019-SCM (Xiao et al., 2019)๐Ÿ“„ ๐Ÿ’ปChinese court judgements๐Ÿ‡จ๐Ÿ‡ณ8.9K triplets of cases

<ins>Legal Question Answering</ins> (LQA)

DatasetLinksDomainLanguageSize
CaseHOLD (Zheng et al., 2021)๐Ÿ“„ ๐Ÿ’ปUS case holdings๐Ÿ‡ฌ๐Ÿ‡ง53.1K multiple-choice questions
JEC-QA (Zhong et al., 2019)๐Ÿ“„ ๐Ÿ’พChinese law๐Ÿ‡จ๐Ÿ‡ณ26.3K multiple-choice questions
CJRC (Duan et al., 2019)๐Ÿ“„ ๐Ÿ’ปChinese court judgements๐Ÿ‡จ๐Ÿ‡ณ50K question-answers from 10K documents
PrivacyQA (Ravichander et al., 2019)๐Ÿ“„ ๐Ÿ’ปPrivacy policies๐Ÿ‡ฌ๐Ÿ‡ง1.7K question-answers from 35 documents

<ins>Legal Textual Entailment</ins> (LTE)

DatasetLinksDomainLanguageSize
COLIEE-Case-Law-Entailment (Rabelo et al., 2020)๐Ÿ“„ ๐Ÿ’พCanadian precedents๐Ÿ‡ฌ๐Ÿ‡ง425 cases w/ related case
COLIEE-Statute-Law-Entailment (Rabelo et al., 2020)๐Ÿ“„ ๐Ÿ’พJapanese legislation๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฏ๐Ÿ‡ต808 questions w/ related statutory article

<ins>Legal Text Summarization</ins> (LTS)

DatasetLinksDomainLanguageSize
UK-Abs (Shukla et al., 2022)๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พUK court cases๐Ÿ‡ฌ๐Ÿ‡ง793 pairs of (case, abastractive summary) from the UK Supreme Court
IN-Abs (Shukla et al., 2022)๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พIndian court cases๐Ÿ‡ฌ๐Ÿ‡ง7.1K pairs of (case, abastractive summary) from the Indian Supreme Court
IN-Ext (Shukla et al., 2022)๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พIndian court cases๐Ÿ‡ฌ๐Ÿ‡ง50 pairs of (case, extractive summary) from the Indian Supreme Court
TOS;DR (Keymanesh et al., 2020)๐Ÿ“„ ๐Ÿ’ปTerms of service๐Ÿ‡ฌ๐Ÿ‡ง1.6K pairs of (agreement text, summary) from data privacy policies
BillSum (Kornilova et al., 2019)๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พUS Congressional bills๐Ÿ‡ฌ๐Ÿ‡ง22.2K pairs of (bill, summary)
TL;DRLegal (Manor et al., 2019)๐Ÿ“„ ๐Ÿ’ปTerms of service๐Ÿ‡ฌ๐Ÿ‡ง84 pairs of (agreement text, summary) from software licenses
TOS;DR (Manor et al., 2019)๐Ÿ“„ ๐Ÿ’ปTerms of service๐Ÿ‡ฌ๐Ÿ‡ง421 pairs of (agreement text, summary) from data privacy policies
BVA Cases (Zhong et al., 2019)๐Ÿ“„ ๐Ÿ’ปUS court cases๐Ÿ‡ฌ๐Ÿ‡ง92 pairs of (case, summary) from the US Board of Veterans' Appeal
LCR (Galgani et al., 2012)๐Ÿ“„ ๐Ÿ’พAustralian court cases๐Ÿ‡ฌ๐Ÿ‡ง3.9K pairs of (case, catchphrases)

<ins>Legal Language Modeling</ins> (LLM)

DatasetLinksLanguageSize
Pile of Law (Henderson et al., 2022)๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป๐Ÿ‡ฌ๐Ÿ‡ง~256GB of legal and administrative legal text

<ins>Benchmarks</ins>

DatasetTaskLanguageTasks
FairLex (Chalkidis et al., 2022)๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡น ๐Ÿ‡จ๐Ÿ‡ณClasification (x1), legal judgement prediction (x3)
LexGLUE (Chalkidis et al., 2022)๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป๐Ÿ‡ฌ๐Ÿ‡งClasssification (x6), multiple-choice QA (x1)

๐Ÿ”ฅ Models

ModelLinksLanguageSize
Legal-HeBERT (Chriqui et al., 2022)๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป๐Ÿ‡ฎ๐Ÿ‡ฑ110M
PoL-BERT-Large (Henderson et al., 2022)๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป๐Ÿ‡ฌ๐Ÿ‡ง336M
Italian-LEGAL-BERT (Licari and Comande, 2022)๐Ÿ“„ ๐Ÿค—๐Ÿ‡ฎ๐Ÿ‡น110M
JuriBERT (Douka et al., 2021)๐Ÿ“„ ๐Ÿ’พ๐Ÿ‡ซ๐Ÿ‡ท{6M, 15M, 42M, 110M}
Custom-LEGAL-BERT (Zheng et al., 2021)๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป๐Ÿ‡ฌ๐Ÿ‡ง110M
LEGAL-BERT (Chalkidis et al., 2020)๐Ÿ“„ ๐Ÿค—๐Ÿ‡ฌ๐Ÿ‡ง{35M, 110M}
LEGAL-GPT-{1,2} (Borchmann et al., 2020)๐Ÿ“„ ๐Ÿ’ป๐Ÿ‡ฌ๐Ÿ‡ง{117M, 1.5B}

๐Ÿ“š Books

๐Ÿ“„ Surveys

๐ŸŽ™ Talks

๐Ÿ—“ Conferences & Workshops

<!--- Datasets to add: - "FALQU: Finding Answers to Legal Questions" - โ€œMultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Datasetโ€ - โ€œClassActionPrediction: A Challenging Benchmark for Legal Judgment Prediction of Class Action Cases in the USโ€ - "MultiLegalPile: A 689GB Multilingual Legal Corpus" - "LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development" - "LEXTREME: A Multi-Lingual and Multi-Task Benchmark for the Legal Domain" - "A Dataset for Evaluating Legal Question Answering on Private International Law" - "EQUALS: A Real-world Dataset for Legal Questions Answering via Reading Chinese Laws" Other cool resources to check: - https://github.com/neelguha/legal-ml-datasets - https://nllpw.org/resources/ - https://github.com/Liquid-Legal-Institute/Legal-Text-Analytics#datasets-and-data -->