Awesome

regulations-core

An API library that provides an interface for storing and retrieving regulations, layers, etc.

This repository is part of a larger project. To read about it, please see http://eregs.github.io/.

Features

Search integration with Elastic Search or Django Haystack
Support for storage via Elastic Search or Django Models
Separation of API into a read and a write portion
Destruction of regulations and layers into their components, allowing paragraph-level access
Schema checking for regulations

Requirements

This library requires

Python 2.7, 3.4, 3.5, or 3.6
Django 1.10, or 1.11

API Docs

regulations-core on Read The Docs

Local development

Tox

We use tox to test across multiple versions of Python and Django. To run our tests, linters, and build our docs, you'll need to install tox globally (Tox handles virtualenvs for us).

pip install tox
# If using pyenv, consider also installing tox-pyenv

Then, run tests and linting across available Python versions:

tox

To build docs, run:

tox -e docs

The output will be in docs/_build/dirhtml.

Running as an application

While this library is generally intended to be used within a larger project, it can also be ran as its own application via Docker or a local Python install. In both cases, we'll run in DEBUG mode using SQLite for data storage. We don't have a turn key solution for integrating this with search (though it can be accomplished via a custom settings file).

To run via Docker,

docker build . -t eregs/core  # only needed after code changes
docker run -p 8080:8080 eregs/core

To run via local Python, run the following inside a virtualenv:

pip install .
python manage.py migrate
python manage.py runserver 0.0.0.0:8080

In both cases, you can find the site locally at http://0.0.0.0:8080/.

Apps included

This repository contains four Django apps, regcore, regcore_read, regcore_write, and regcore_pgsql. The first contains shared models and libraries. The "read" app provides read-only end-points while the "write" app provides write-only end-points (see the next section for security implications.) We recommend using regcore.urls as your url router, in which case turning on or off read/write capabilities is as simple as including the appropriate applications in your Django settings file. The final app, regcore_pgsql contains all of the modules related to running with a Postgres-based search index. Note that you will always need regcore installed.

Security

Note that regcore_write is designed to only be active inside an organization; the assumption is that data will be pushed to public facing, read-only (i.e. without regcore_write) sites separately.

When using the Elastic Search backend, data is passed as JSON, preventing SQL-like injections. When using haystack, data is stored via Django's model framework, which escapes SQL before it hits the db.

All data types require JSON input (which is checked.) The regulation type has an additional schema check, which is currently not present for other data types. Again, this liability is limited by the segmentation of read and write end points.

As all data is assumed to be publicly visible, data is not encrypted before it is sent to the storage engine. Data may be compressed, however.

Be sure to override the default settings for both SECRET_KEY and to turn DEBUG off in your local_settings.py

Storage-Backends

This project allows multiple backends for storing, retrieving, and searching data. The default settings file uses Django models for data storage and Haystack for search, but Elastic Search (1.7) or Postgres can be used instead.

Django Models For Data, Haystack For Search

This is the default configuration. You will need to have haystack installed and one of their backends. In your settings file, use:

BACKENDS = {
    'regulations': 'regcore.db.django_models.DMRegulations',
    'layers': 'regcore.db.django_models.DMLayers',
    'notices': 'regcore.db.django_models.DMNotices',
    'diffs': 'regcore.db.django_models.DMDiffs'
}
SEARCH_HANDLER = 'regcore_read.views.haystack_search.search'

You will need to migrate the database (manage.py migrate) to get started and rebuild the search index (manage.py rebuild_index) after adding documents.

Django Models For Data, Postgres For Search

If running Django 1.10 or greater, you may skip haystack and rely exclusively on Postgres for search. The current search index only indexes at the CFR section level. Install the psycopg (e.g. through pip install regcore[backend-pgsql]) and use the following settings:

BACKENDS = {
    'regulations': 'regcore.db.django_models.DMRegulations',
    'layers': 'regcore.db.django_models.DMLayers',
    'notices': 'regcore.db.django_models.DMNotices',
    'diffs': 'regcore.db.django_models.DMDiffs'
}
SEARCH_HANDLER = 'regcore_pgsql.views.search'
APPS.append('regcore_pgsql')

You may wish to extend the regcore.settings.pgsql module for simplicity.

You will need to migrate the database (manage.py migrate) to get started and rebuild the search index (manage.py rebuild_pgsql_index) after adding documents.

Elastic Search For Data and Search

If pyelasticsearch is installed (e.g. through pip install regcore[backend-elastic]), you can use Elastic Search (1.7) for both data storage and search. Add the following to your settings file:

BACKENDS = {
    'regulations': 'regcore.db.es.ESRegulations',
    'layers': 'regcore.db.es.ESLayers',
    'notices': 'regcore.db.es.ESNotices',
    'diffs': 'regcore.db.es.ESDiffs'
}
SEARCH_HANDLER = 'regcore_read.views.es_search.search'

You may wish to extend the regcore.settings.elastic module for simplicity.

Settings

While we provide sane default settings in regcore/settings/base.py, we recommend these defaults be overridden as needed in a local_settings.py file.

If using Elastic Search, you will need to let the application know how to connect to the search servers.

ELASTIC_SEARCH_URLS - a list of strings which define how to connect to your search server(s). This is passed along to pyelasticsearch.
ELASTIC_SEARCH_INDEX - the index to be used by elastic search. This defaults to 'eregs'

The BACKENDS setting (as described above) must be a dictionary of the appropriate model names ('regulations', 'layers', etc.) to the associated backend class. Backends can be mixed and matched, though I can't think of a good use case for that desire.

All standard Django and haystack settings are also available; you will likely want to override DATABASES, HAYSTACK_CONNECTIONS, DEBUG and certainly SECRET_KEY.

Importing Data

Via the `eregs` parser

The eregs script (see regulations-parser) includes subcommands which will write processed data to a running API. Notably, if write_to (the last step of pipeline) is directed at a target beginning with http:// or https://, it will write the relevant data to that host. Note that HTTP authentication can be encoded within these urls. For example, if the API is running on the localhost, port 8000, you could run:

$ eregs write_to http://localhost:8000/

See the command line docs for more detail.

Via the `import_docs` Django command

If you've already exported data from the parser, you may import it from the command line with the import_docs Django management command. It should be given the root directory of the data as its only parameter. Note that this does not require a running API.

$ ls /path/to/data-root
diff  layer  notice  regulation
$ python manage.py import_docs /path/to/data-root

Via curl

You may also simulate sending data to a running API via curl, if you've exported data from the parser. For example, if the API is running on the localhost, port 8000, you could run:

$ cd /path/to/data-root
$ ls
diff  layer  notice  regulation
$ for TAIL in $(find */* -type f | sort -r) \
do \
    curl -X PUT http://localhost:8000/$TAIL -d @$TAIL \
done