Awesome

eu-legislation-strictness-analysis

Scripts and files required for analysing strictness of EU legislation. The files in this repo have been created to perform strictness analysis of EU legislative documents. Prior to analysis by the scripts in this repository, the documents are downloaded and processed by a pipeline of software components, each associated with a different Github repository (see the diagram in the next section of this README for an illustration of the workflow). The processing results of the prior steps in the workflow are stored in two CSV files which serve as the input for the analysis scripts in this repository (the node labelled Quantitative Analysis of Regulatory Statements in the diagram in the next section represents the files of this repository).

Pipeline diagram

This repository has scripts for performing analysis of EU legislative documents that have been processed in a specific manner. The files have been downloaded and processed by a pipeline of components, each component has its own repository because they are potentially useful as independent and reusable tools for other projects or purposes. In order to conduct the specific analysis described at the start of this README, we have extracted and processed the data as depicted in diagram (each node in the diagram is clickable and linked to a corresponding Github repository with more information about that specific component):

flowchart TD
    A(<a href='https://github.com/nature-of-eu-rules/data-extraction'>1. Download Documents and Metadata</a>) -->| Directory of EU legislative documents in PDF or HTML format | B(<a href='https://github.com/nature-of-eu-rules/data-preprocessing'>2. Extract All Sentences</a>)
    B -->| Extracted Sentences in CSV file | C(<a href='https://github.com/nature-of-eu-rules/regulatory-statement-classification'>3. Identify Regulatory Sentences</a>)
    C -->| Classified Sentences in CSV file | D(<a href='https://github.com/nature-of-eu-rules/eu-legislation-strictness-analysis'>4. Quantitative Analysis of Regulatory Statements</a>)
    A -->| Document metadata in CSV file | D

Usage: without Docker

Requirements

Python 3.9.17+
A tool for checking out a Git repository

Steps (setup environment)

Get a copy of the code:

 git clone git@github.com:nature-of-eu-rules/eu-legislation-strictness-analysis.git

Change into the eu-legislation-strictness-analysis/ directory:
```
 cd eu-legislation-strictness-analysis/
```

Create new virtual environment e.g:

 python -m venv path/to/virtual/environment/folder/

Activate new virtual environment e.g. for MacOSX users type:

 source path/to/virtual/environment/folder/bin/activate

Install required libraries for the script in this virtual environment:
```
 pip install -r requirements.txt
```

Steps (running scripts)

Usage help for prepare-data-for-analysis.py

 python prepare-data-for-analysis.py -h

Example usage for prepare-data-for-analysis.py

 python prepare-data-for-analysis.py -m metadata.csv -c classified_sentences.csv -o metadata_with_classification_results.csv

Usage help for analysis.py
```
 python analysis.py -h
```

Example usage for analysis.py

  python analysis.py --input metadata_with_classification_results.csv --strictm count --output results/ -t date

Input

prepare-data-for-analysis.py

A CSV file generated by this script containing metadata (such as adoption dates and legal policy areas addressed etc.) about EU legislative documents downloaded from EURLEX, and
A CSV file generated with the help of either this or this script (different approaches to classifying regulatory sentences) containing a list of sentences with a list of celex numbers of the documents in which they originate, and a column which shows whether a sentence was classified as regulatory (indicated by a 1) or constitutive (indicated with 0).

See the data-extraction and regulatory-statement-classification repositories for more detailed information about the input data.

analysis.py

The output CSV file generated by prepare-data-for-analysis.py is the main input file for analysis.py. There are some required and optional switches (arguments) for this script:

--strictm : this is the particular strictness metric we want to use for the analysis. Currently this can be one of two values count (the count of regulatory statements) and mean (the average number of regulatory statements per document). But these metrics are just the starting point for exploring strictness and there would be more metrics defined in future.
--nozeros : this boolean flag indicates whether we want to analyse only those documents which contain at least one regulatory statement or do we wish to analyse all documents (by default we analyse all)
--time : this refers to the temporal unit that we wish to use when analysing the change of strictness over time. This can currently be one of two values year (analyses overall strictness in a particular year) or date (analyses the strictness of legislation on a particular day or date e.g. 2022-01-01).

Output

prepare-data-for-analysis.py

A CSV file which looks very similar to the input metadata file mentioned in the previous section but with new columns for the number of regulatory sentences identified in each document, number of sentences and words in that document etc.

analysis.py

A directory with generated interactive .html plots and plot image files (.svg) describing to some extent the strictness or density of regulatory statements in the input documents (over time and by legal policy area).

Usage: with Docker

Requirements

Docker v4.21.1+
A tool for checking out a Git repository
Set of EU legislation documents in PDF or HTML format - these documents must be placed in a directory called 'data/' which is placed in the same directory as the Dockerfile in this repository. The filename of each legislation document should be its CELEX number e.g. 32013R0575.pdf. You can download such documents yourself or using this script
A CSV file called 'metadata.csv' which is generated by this script. Again, this file must be placed in the same directory as the Dockerfile in this repo. See the documentation of this repo for more information about the required internal format and contents of this file

Steps

Get a copy of the code:

 git clone git@github.com:nature-of-eu-rules/eu-legislation-strictness-analysis.git

Change into the eu-legislation-strictness-analysis/ directory:
```
 cd eu-legislation-strictness-analysis/
```
Run this command to build the Docker image (the analysis is performed during the build so this step may take some time if you have a large number of input files):
```
 docker build -t eurules-analysis .
```
After the image is built, run the container using this command:
```
 docker run -d --name eurules-run eurules-analysis
```
Copy the results analysis files to your local machine using this command:
```
 docker cp eurules-run:app/eurules-analysis.zip .
```

Output

The eurules-analysis.zip archive contains:

A directory eu-legislation-strictness-analysis/generated-files/ containing automatically generated analysis plots in .svg image format and .html interactive format.
A CSV file regulatory-statement-classification/classified-sentences.csv which stores the results of classifying each individual sentence as either regulatory or not.
A CSV file named eu-legislation-strictness-analysis/enriched-metadata.csv which adds a column to the input metadata.csv file storing the counts of regulatory statements in each document. This is the main file used as the basis for analysis.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.