Home

Awesome

Legal Crawler :octopus:

A collection of scripts to crawl English legal corpora :closed_book: from open public domains.

CorpusDomainCorpus alias
:eu: EU legislationhttps://eur-lex.europa.eu/eu
:uk: UK legislationhttps://legislation.gov.uk/uk
:canada: Canadian legislationhttp://laws.justice.gc.ca/eng/ca
:jp: Japanese legislationhttp://www.japaneselawtranslation.go.jp/law/jp
:finland: Finish legislationhttps://www.finlex.fi/enfi
:us: US case law*https://case.law/bulk/download/us

* In order to use the script for US case law, you need to first apply for a researcher account at https://case.law.

:bangbang: Disclaimer :bangbang:

Project Requirements:

Python packages

Linux packages (command line tools)

The following linux packages are used to process PDF documents:

Quick start:

Install python requirements:

pip install -r requirements.txt

sudo apt-get install libcairo2-dev
sudo apt-get install libpango1.0-dev
sudo apt-get install -y xpdf
sudo apt-get install mupdf mupdf-tools

Download Canadian legislation

python download_legal_corpora.py --corpus ca

Download EU legislation

python download_legal_corpora.py --corpus eu

Download all (EU, UK, CA, FI, JP, US)

python download_legal_corpora.py --corpus all

Citation

In case you use this repo or any derivative in your work, please cite using the following:

@Misc{chalkidis-legalcrawler,
author =   {Ilias Chalkidis},
title =    {{Legal Crawler}: A collection of scripts to crawl English legal corpora from open public domains.},
howpublished = {\url{https://github.com/iliaschalkidis/LegalCrawler/}},
year = {2020--2022}
}