Home

Awesome

eu-legislation-strictness-analysis

Scripts and files required for analysing strictness of EU legislation. The files in this repo have been created to perform strictness analysis of EU legislative documents. Prior to analysis by the scripts in this repository, the documents are downloaded and processed by a pipeline of software components, each associated with a different Github repository (see the diagram in the next section of this README for an illustration of the workflow). The processing results of the prior steps in the workflow are stored in two CSV files which serve as the input for the analysis scripts in this repository (the node labelled Quantitative Analysis of Regulatory Statements in the diagram in the next section represents the files of this repository).

Pipeline diagram

This repository has scripts for performing analysis of EU legislative documents that have been processed in a specific manner. The files have been downloaded and processed by a pipeline of components, each component has its own repository because they are potentially useful as independent and reusable tools for other projects or purposes. In order to conduct the specific analysis described at the start of this README, we have extracted and processed the data as depicted in diagram (each node in the diagram is clickable and linked to a corresponding Github repository with more information about that specific component):

flowchart TD
    A(<a href='https://github.com/nature-of-eu-rules/data-extraction'>1. Download Documents and Metadata</a>) -->| Directory of EU legislative documents in PDF or HTML format | B(<a href='https://github.com/nature-of-eu-rules/data-preprocessing'>2. Extract All Sentences</a>)
    B -->| Extracted Sentences in CSV file | C(<a href='https://github.com/nature-of-eu-rules/regulatory-statement-classification'>3. Identify Regulatory Sentences</a>)
    C -->| Classified Sentences in CSV file | D(<a href='https://github.com/nature-of-eu-rules/eu-legislation-strictness-analysis'>4. Quantitative Analysis of Regulatory Statements</a>)
    A -->| Document metadata in CSV file | D

Usage: without Docker

Requirements
Steps (setup environment)
  1. Get a copy of the code:

     git clone git@github.com:nature-of-eu-rules/eu-legislation-strictness-analysis.git
    
  2. Change into the eu-legislation-strictness-analysis/ directory:

     cd eu-legislation-strictness-analysis/
    
  3. Create new virtual environment e.g:

     python -m venv path/to/virtual/environment/folder/
    
  4. Activate new virtual environment e.g. for MacOSX users type:

     source path/to/virtual/environment/folder/bin/activate
     
    
  5. Install required libraries for the script in this virtual environment:

     pip install -r requirements.txt
    
Steps (running scripts)
  1. Usage help for prepare-data-for-analysis.py

     python prepare-data-for-analysis.py -h
    
  2. Example usage for prepare-data-for-analysis.py

     python prepare-data-for-analysis.py -m metadata.csv -c classified_sentences.csv -o metadata_with_classification_results.csv
    
  3. Usage help for analysis.py

     python analysis.py -h
    
  4. Example usage for analysis.py

      python analysis.py --input metadata_with_classification_results.csv --strictm count --output results/ -t date
    
Input

prepare-data-for-analysis.py

See the data-extraction and regulatory-statement-classification repositories for more detailed information about the input data.

analysis.py

The output CSV file generated by prepare-data-for-analysis.py is the main input file for analysis.py. There are some required and optional switches (arguments) for this script:

Output

prepare-data-for-analysis.py

A CSV file which looks very similar to the input metadata file mentioned in the previous section but with new columns for the number of regulatory sentences identified in each document, number of sentences and words in that document etc.

analysis.py

A directory with generated interactive .html plots and plot image files (.svg) describing to some extent the strictness or density of regulatory statements in the input documents (over time and by legal policy area).

Usage: with Docker

Requirements
Steps
  1. Get a copy of the code:

     git clone git@github.com:nature-of-eu-rules/eu-legislation-strictness-analysis.git
    
  2. Change into the eu-legislation-strictness-analysis/ directory:

     cd eu-legislation-strictness-analysis/
    
  3. Run this command to build the Docker image (the analysis is performed during the build so this step may take some time if you have a large number of input files):

     docker build -t eurules-analysis .
    
  4. After the image is built, run the container using this command:

     docker run -d --name eurules-run eurules-analysis
    
  5. Copy the results analysis files to your local machine using this command:

     docker cp eurules-run:app/eurules-analysis.zip .
    
Output

The eurules-analysis.zip archive contains:

License

Copyright (2023) Kody Moodley, The Netherlands eScience Center

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.