Awesome
Synthetic Patient Statistics Using Postgres
This repository contains all documentation, data, and setup scripts for synthetic patient statistics stored in Postgres. Current the scripts in this repository only work on Unix-like platforms (e.g. Ubuntu, Mac OS).
Database Schema
The current schema for the synthetic statistics is document as a UML diagram. See synth_ma_schema.pdf. This diagram was generated using Star UML. The .mdj
file created by Star UML is also in this repository and can be edited for future schema revisions.
Most data that are used by the Synthetic Mass UI are exposed as Postgres views. These views are typically less granular but provide more useful statistics than the underlying data.
Views
synth_county_pop_stats
- total population stats at the county levelsynth_cousub_pop_stats
- total population stats at the subdivision levelsynth_county_condition_stats
- condition-specific stats at the county levelsynth_cousub_condition_stats
- condition-specific stats at the subdivision levelsynth_county_disease_stats
- disease-specific stats at the county levelsynth_cousub_disease_stats
- disease-specific stats at the subdivision level
The tables underlying these views have higher granularity and are generally updated with the most recent stats. They are rarely queried directly and are exposed through the views mentioned above.
We interpret "conditions" to mean a unique condition with a unique SNOMED code, for example "myocardial_infarction".
We interpret "diseases" to be high-level statistics that may include one or more conditions. For example, the disease "food_allergy" may include the conditions "food_allergy_peanuts", "food_allergy_tree_nuts", and "food_allergy_shellfish".
For all county-level statistics we simply aggregate the subdivision statistics.
Tables
synth_county_dim
- immutable county datasynth_cousub_dim
- immutable subdivision datasynth_condition_dim
- conditions we tracksynth_disease_dim
- diseases we tracksynth_age_range_dim
- age ranges we tracksynth_pop_facts
- total population stats at the subdivision levelsynth_condition_facts
- condition-specific population stats at the subdivision levelsynth_disease_facts
- disease-specific population stats at the subdivision level
Note: the "_dim" suffix means "dimension". These tables contain data that rarely changes.
Deploying
The synthetic statistics schema can be deployed to the Postgres database by running deploy_stats_schema.sh
. This does the following:
- Drops all existing dimension tables
- Creates all dimension tables
- Populates all dimension tables with current data (from CSV files)
- Creates all fact tables (these will be empty)
- Creates all views
- Runs
VACUUM
on the new tables
Note: This script assumes that the fhir
database and role synth_ma
already exist.
Resetting Statistics
Run the script resetfacts.sh
as user postgres
. This does the following:
- Drops all rows in the
synth_*_facts
tables in thesynth_ma
schema.
This will cause all views to automatically update with zero-valued stats. We interpret null rows in the synth_*_facts
tables as "zero".
Updating Views
If the view logic changes (for example, to add additional computed statistics like rates) all views can be updated by running resetviews.sh
as user postgres
. This does the following:
- Drops (cascading) all
synth_*_*_stats
views in thesynth_ma
schema. - Rebuilds the views using the updated SQL files.
Adding or Updating Statistics
Facts should be added to the facts tables using "upserts". In each of the fact tables each fact is uniquely identified by a composite key. If an attempted insert into any of the synth_*_facts
table violates these keys an update of the existing row should be performed instead.
License
Copyright 2016 The MITRE Corporation
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.