Awesome
Legal Natural Language Processing
๐ Datasets
<ins>Legal Judgement Prediction</ins> (LJP)
Dataset | Links | Domain | Language | Size |
---|---|---|---|---|
FSCS (Niklaus et al., 2021) | ๐ ๐ค ๐ป | Swiss court judgments | ๐ฉ๐ช ๐ซ๐ท ๐ฎ๐น | 85K cases w/ 2 outcomes |
ECtHR (Chalkidis et al., 2021) | ๐ ๐ค | EU court judgments | ๐ฌ๐ง | 11K cases w/ 11 outcomes |
ECHR (Aletras et al., 2019) | ๐ ๐พ | EU court judgments | ๐ฌ๐ง | 11.5K cases w/ 11 outcomes |
CAIL (Xiao et al., 2018) | ๐ ๐ป | Chinese court judgements | ๐จ๐ณ | 2.6M cases w/ 6 outcomes |
<ins>Legal Text Classification</ins> (LTC)
Dataset | Links | Domain | Language | Size |
---|---|---|---|---|
GLC (Papaloukas et al., 2021) | ๐ ๐ค ๐ป | Greek legislation | ๐ฌ๐ท | 47.5K laws w/ 2.7K labels |
CUAD (Hendrycks et al., 2021) | ๐ ๐ค ๐ป | Contracts | ๐ฌ๐ง | 510 contracts w/ 41 classes |
MultiEURLEX (Chalkidis et al., 2021) | ๐ ๐ค ๐ป | EU legislation | ๐ฌ๐ง ๐ฉ๐ช ๐ซ๐ท ๐ฎ๐น ๐ช๐ธ (18+) | 65K laws w/ 4.5K labels |
LEDGAR (Tuggener et al., 2020) | ๐ ๐พ | Contracts | ๐ฌ๐ง | 60.5K contracts w/ 12.6K labels |
Contract Discovery (Borchmann et al., 2020) | ๐ ๐ป | Contracts | ๐ฌ๐ง | 2.6K clauses w/ 21 classes |
EURLEX-57K (Chalkidis et al., 2019) | ๐ ๐พ | EU legislation | ๐ฌ๐ง | 57K laws w/ 4.3K labels |
Unfair-ToS (Lippi et al., 2018) | ๐ ๐พ | Contracts | ๐ฌ๐ง | 9.4K sentences w/ 9 classes |
Contract Elements (Chalkidis et al., 2017) | ๐ ๐พ | Contracts | ๐ฌ๐ง | 2.4K contracts w/ 10 classes |
OPP-115 (Wilson et al., 2016) | ๐ ๐พ | Privacy laws | ๐ฌ๐ง | 115 policies w/ 23K labels |
<ins>Legal Information Retrieval</ins> (LIR)
Dataset | Links | Domain | Language | Size |
---|---|---|---|---|
BSARD (Louis et al., 2022) | ๐ ๐ค ๐ป | Belgian legislation | ๐ซ๐ท | 1.1K questions w/ 22.6K candidate statutory articles |
EU2UK (Chalkidis et al., 2021) | ๐ ๐พ | EU & UK legislation | ๐ฌ๐ง | 2K query documents w/ 52.5K candidate documents |
UK2EU (Chalkidis et al., 2021) | ๐ ๐พ | EU & UK legislation | ๐ฌ๐ง | 2.1K query documents w/ 3.9K candidate documents |
COLIEE-Case-Law-Retrieval (Rabelo et al., 2020) | ๐ ๐พ | Canadian precedents | ๐ฌ๐ง | 650 query cases w/ 128K candidate cases |
COLIEE-Statute-Law-Retrieval (Rabelo et al., 2020) | ๐ ๐พ | Japanese legislation | ๐ฌ๐ง ๐ฏ๐ต | 808 questions w/ 768 candidate statutory articles |
CAIL2019-SCM (Xiao et al., 2019) | ๐ ๐ป | Chinese court judgements | ๐จ๐ณ | 8.9K triplets of cases |
<ins>Legal Question Answering</ins> (LQA)
Dataset | Links | Domain | Language | Size |
---|---|---|---|---|
CaseHOLD (Zheng et al., 2021) | ๐ ๐ป | US case holdings | ๐ฌ๐ง | 53.1K multiple-choice questions |
JEC-QA (Zhong et al., 2019) | ๐ ๐พ | Chinese law | ๐จ๐ณ | 26.3K multiple-choice questions |
CJRC (Duan et al., 2019) | ๐ ๐ป | Chinese court judgements | ๐จ๐ณ | 50K question-answers from 10K documents |
PrivacyQA (Ravichander et al., 2019) | ๐ ๐ป | Privacy policies | ๐ฌ๐ง | 1.7K question-answers from 35 documents |
<ins>Legal Textual Entailment</ins> (LTE)
Dataset | Links | Domain | Language | Size |
---|---|---|---|---|
COLIEE-Case-Law-Entailment (Rabelo et al., 2020) | ๐ ๐พ | Canadian precedents | ๐ฌ๐ง | 425 cases w/ related case |
COLIEE-Statute-Law-Entailment (Rabelo et al., 2020) | ๐ ๐พ | Japanese legislation | ๐ฌ๐ง ๐ฏ๐ต | 808 questions w/ related statutory article |
<ins>Legal Text Summarization</ins> (LTS)
Dataset | Links | Domain | Language | Size |
---|---|---|---|---|
UK-Abs (Shukla et al., 2022) | ๐ ๐ป ๐พ | UK court cases | ๐ฌ๐ง | 793 pairs of (case, abastractive summary) from the UK Supreme Court |
IN-Abs (Shukla et al., 2022) | ๐ ๐ป ๐พ | Indian court cases | ๐ฌ๐ง | 7.1K pairs of (case, abastractive summary) from the Indian Supreme Court |
IN-Ext (Shukla et al., 2022) | ๐ ๐ป ๐พ | Indian court cases | ๐ฌ๐ง | 50 pairs of (case, extractive summary) from the Indian Supreme Court |
TOS;DR (Keymanesh et al., 2020) | ๐ ๐ป | Terms of service | ๐ฌ๐ง | 1.6K pairs of (agreement text, summary) from data privacy policies |
BillSum (Kornilova et al., 2019) | ๐ ๐ป ๐พ | US Congressional bills | ๐ฌ๐ง | 22.2K pairs of (bill, summary) |
TL;DRLegal (Manor et al., 2019) | ๐ ๐ป | Terms of service | ๐ฌ๐ง | 84 pairs of (agreement text, summary) from software licenses |
TOS;DR (Manor et al., 2019) | ๐ ๐ป | Terms of service | ๐ฌ๐ง | 421 pairs of (agreement text, summary) from data privacy policies |
BVA Cases (Zhong et al., 2019) | ๐ ๐ป | US court cases | ๐ฌ๐ง | 92 pairs of (case, summary) from the US Board of Veterans' Appeal |
LCR (Galgani et al., 2012) | ๐ ๐พ | Australian court cases | ๐ฌ๐ง | 3.9K pairs of (case, catchphrases) |
<ins>Legal Language Modeling</ins> (LLM)
Dataset | Links | Language | Size |
---|---|---|---|
Pile of Law (Henderson et al., 2022) | ๐ ๐ค ๐ป | ๐ฌ๐ง | ~256GB of legal and administrative legal text |
<ins>Benchmarks</ins>
Dataset | Task | Language | Tasks |
---|---|---|---|
FairLex (Chalkidis et al., 2022) | ๐ ๐ค ๐ป | ๐ฌ๐ง ๐ฉ๐ช ๐ซ๐ท ๐ฎ๐น ๐จ๐ณ | Clasification (x1), legal judgement prediction (x3) |
LexGLUE (Chalkidis et al., 2022) | ๐ ๐ค ๐ป | ๐ฌ๐ง | Classsification (x6), multiple-choice QA (x1) |
๐ฅ Models
Model | Links | Language | Size |
---|---|---|---|
Legal-HeBERT (Chriqui et al., 2022) | ๐ ๐ค ๐ป | ๐ฎ๐ฑ | 110M |
PoL-BERT-Large (Henderson et al., 2022) | ๐ ๐ค ๐ป | ๐ฌ๐ง | 336M |
Italian-LEGAL-BERT (Licari and Comande, 2022) | ๐ ๐ค | ๐ฎ๐น | 110M |
JuriBERT (Douka et al., 2021) | ๐ ๐พ | ๐ซ๐ท | {6M, 15M, 42M, 110M} |
Custom-LEGAL-BERT (Zheng et al., 2021) | ๐ ๐ค ๐ป | ๐ฌ๐ง | 110M |
LEGAL-BERT (Chalkidis et al., 2020) | ๐ ๐ค | ๐ฌ๐ง | {35M, 110M} |
LEGAL-GPT-{1,2} (Borchmann et al., 2020) | ๐ ๐ป | ๐ฌ๐ง | {117M, 1.5B} |
๐ Books
- [
2017
] Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age, K. Ashley. [link]
๐ Surveys
- [
2020-05
] How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence, H. Zhong et al. [pdf] - [
2019-09
] A Brief History of the Changing Roles of Case Prediction in AI and Law, K. Ashley [pdf] - [
2018-12
] Deep learning in law: early adaptation and legal word embeddings trained on large corpora, I. Chalkidis et al. [pdf]
๐ Talks
- [
2019-06
] Law as Data: The Promise and Challenges of Natural Language Processing for Legal Research, A. Dyevre. [slides] - [
2019-04
] Artificial Intelligence and Law โ An Overview and History, H. Surden. [video]
๐ Conferences & Workshops
- The Natural Legal Language Processing (NLLP) Workshop [website]
- The International Conference on Artificial Intelligence and Law (ICAIL) [website]
- The International Conference on Legal Knowledge and Information Systems (JURIX) [website]
- The EXplainable AI in Law (XAILA) Workshop [website]
- The International Workshop on Juris-informatics (JURISIN) [website]
- The Competition on Legal Information Extraction/Entailment (COLIEE) [website]
- The International Workshop on Legal Information Retrieval [website]