Awesome
eu-legislation-strictness-analysis
Scripts and files required for analysing strictness of EU legislation. The files in this repo have been created to perform strictness
analysis of EU legislative documents. Prior to analysis by the scripts in this repository, the documents are downloaded and processed by a pipeline of software components, each associated with a different Github repository (see the diagram in the next section of this README for an illustration of the workflow). The processing results of the prior steps in the workflow are stored in two CSV files which serve as the input for the analysis scripts in this repository (the node labelled Quantitative Analysis of Regulatory Statements
in the diagram in the next section represents the files of this repository).
Pipeline diagram
This repository has scripts for performing analysis of EU legislative documents that have been processed in a specific manner. The files have been downloaded and processed by a pipeline of components, each component has its own repository because they are potentially useful as independent and reusable tools for other projects or purposes. In order to conduct the specific analysis described at the start of this README, we have extracted and processed the data as depicted in diagram (each node in the diagram is clickable and linked to a corresponding Github repository with more information about that specific component):
flowchart TD
A(<a href='https://github.com/nature-of-eu-rules/data-extraction'>1. Download Documents and Metadata</a>) -->| Directory of EU legislative documents in PDF or HTML format | B(<a href='https://github.com/nature-of-eu-rules/data-preprocessing'>2. Extract All Sentences</a>)
B -->| Extracted Sentences in CSV file | C(<a href='https://github.com/nature-of-eu-rules/regulatory-statement-classification'>3. Identify Regulatory Sentences</a>)
C -->| Classified Sentences in CSV file | D(<a href='https://github.com/nature-of-eu-rules/eu-legislation-strictness-analysis'>4. Quantitative Analysis of Regulatory Statements</a>)
A -->| Document metadata in CSV file | D
Usage: without Docker
Requirements
Steps (setup environment)
-
Get a copy of the code:
git clone git@github.com:nature-of-eu-rules/eu-legislation-strictness-analysis.git
-
Change into the
eu-legislation-strictness-analysis/
directory:cd eu-legislation-strictness-analysis/
-
Create new virtual environment e.g:
python -m venv path/to/virtual/environment/folder/
-
Activate new virtual environment e.g. for MacOSX users type:
source path/to/virtual/environment/folder/bin/activate
-
Install required libraries for the script in this virtual environment:
pip install -r requirements.txt
Steps (running scripts)
-
Usage help for
prepare-data-for-analysis.py
python prepare-data-for-analysis.py -h
-
Example usage for
prepare-data-for-analysis.py
python prepare-data-for-analysis.py -m metadata.csv -c classified_sentences.csv -o metadata_with_classification_results.csv
-
Usage help for
analysis.py
python analysis.py -h
-
Example usage for
analysis.py
python analysis.py --input metadata_with_classification_results.csv --strictm count --output results/ -t date
Input
prepare-data-for-analysis.py
- A CSV file generated by this script containing metadata (such as adoption dates and legal policy areas addressed etc.) about EU legislative documents downloaded from EURLEX, and
- A CSV file generated with the help of either this or this script (different approaches to classifying regulatory sentences) containing a list of sentences with a list of celex numbers of the documents in which they originate, and a column which shows whether a sentence was classified as regulatory (indicated by a
1
) or constitutive (indicated with0
).
See the data-extraction and regulatory-statement-classification repositories for more detailed information about the input data.
analysis.py
The output CSV file generated by prepare-data-for-analysis.py
is the main input file for analysis.py
. There are some required and optional switches (arguments) for this script:
--strictm
: this is the particular strictness metric we want to use for the analysis. Currently this can be one of two valuescount
(the count of regulatory statements) andmean
(the average number of regulatory statements per document). But these metrics are just the starting point for exploring strictness and there would be more metrics defined in future.--nozeros
: this boolean flag indicates whether we want to analyse only those documents which contain at least one regulatory statement or do we wish to analyse all documents (by default we analyse all)--time
: this refers to the temporal unit that we wish to use when analysing the change of strictness over time. This can currently be one of two valuesyear
(analyses overall strictness in a particular year) ordate
(analyses the strictness of legislation on a particular day or date e.g. 2022-01-01).
Output
prepare-data-for-analysis.py
A CSV file which looks very similar to the input metadata file mentioned in the previous section but with new columns for the number of regulatory sentences identified in each document, number of sentences and words in that document etc.
analysis.py
A directory with generated interactive .html
plots and plot image files (.svg
) describing to some extent the strictness or density of regulatory statements in the input documents (over time and by legal policy area).
Usage: with Docker
Requirements
- Docker v4.21.1+
- A tool for checking out a Git repository
- Set of EU legislation documents in PDF or HTML format - these documents must be placed in a directory called 'data/' which is placed in the same directory as the Dockerfile in this repository. The filename of each legislation document should be its CELEX number e.g.
32013R0575.pdf
. You can download such documents yourself or using this script - A CSV file called 'metadata.csv' which is generated by this script. Again, this file must be placed in the same directory as the Dockerfile in this repo. See the documentation of this repo for more information about the required internal format and contents of this file
Steps
-
Get a copy of the code:
git clone git@github.com:nature-of-eu-rules/eu-legislation-strictness-analysis.git
-
Change into the
eu-legislation-strictness-analysis/
directory:cd eu-legislation-strictness-analysis/
-
Run this command to build the Docker image (the analysis is performed during the build so this step may take some time if you have a large number of input files):
docker build -t eurules-analysis .
-
After the image is built, run the container using this command:
docker run -d --name eurules-run eurules-analysis
-
Copy the results analysis files to your local machine using this command:
docker cp eurules-run:app/eurules-analysis.zip .
Output
The eurules-analysis.zip
archive contains:
- A directory
eu-legislation-strictness-analysis/generated-files/
containing automatically generated analysis plots in .svg image format and .html interactive format. - A CSV file
regulatory-statement-classification/classified-sentences.csv
which stores the results of classifying each individual sentence as either regulatory or not. - A CSV file named
eu-legislation-strictness-analysis/enriched-metadata.csv
which adds a column to the inputmetadata.csv
file storing the counts of regulatory statements in each document. This is the main file used as the basis for analysis.
License
Copyright (2023) Kody Moodley, The Netherlands eScience Center
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.