Home

Awesome

Back2Future: Leveraging Backfill Dynamics forImproving Real-time Predictions in Future

Link to paper: https://arxiv.org/abs/2106.04420

Setup

First install Anaconda. The dependencies are listed in environment.yml file. Make sure you make changes to version of cudatoolkit if applicable.

Then run the following commands:

conda env create --prefix ./envs/backfill --file environment.yml
source activate ./envs/backfill

Directory structure

-data_extract
	- create_table.py -> preprocess raw covid dataset as pkl files
- gnnrnn/gnn_model.py -> implrmrntation of B2F
- model_preds -> folder containing predictions of hub models
- results -> stores predictions
- covid_utils.py -> utility functions to extract bseqs
- train_bseqenc.py -> Pre-train BeseqEnc
- train_b2f.py -> Train B2F for given week and model prediction history and infer predictions

Dataset

The dataset is at covid_data folder. It contains csv file for each week that contains revised values for all signals from all previous weeks and current week. For example covid_data/covid-hospitalization-all-state-merged_vEW202030.csv contains revised dataset observed on week 30.

Training and Predictions

Extract data

Run extract.sh to extract backfill values with missing values filled (as described in supplementary) at saves.

Pretrain BseqEnc

Run:

python train_bseqenc.py -l <current_week> -p <epochs> -c <cuda? yes/no> -m <minimum lenght of bseq> -n <experiment name>

This will store a trained bseqenc at saves\<experiment name>_rev_model.pth

Fine-tune B2F

Run:

python train_b2f.py -l <current_week> -e <epochs> -a <weeks ahead> -c <cuda? yes/no> -n <experiment name>

This will provide predictions in a dictionary at saves\<experiment name>_pred.pkl in form of a dictionary like:

{'Expt name': 'week_40_1',
 'Weeks ahead': 2,
 'regions': ['CA', 'DC', 'FL', 'GA', 'IL', 'NY', 'TX', 'WA'],
 'current week': 40,
 'forecast': array([410.17530001,   4.82940001, 610.0748    , 209.7412    ,
        226.7672    ,  58.44170001, 658.7947    ,  47.6249    ]),
 'refined': array([485.72659425,   5.42677917, 614.71402962, 216.20428756,
        213.2565347 ,  68.36743161, 664.00933847,  47.86988136])}

The model is stored in saves folder with files:

Note: For example of a single run of full pipeline see example.sh