Home

Awesome

Traffic4Cast 2022 - TSE

Solution of team TSE to NeurIPS2022-Traffic4cast Challenge

Installation

Necessary packages needed for running the scripts are included in requirements.txt. In addition, the official t4c package have to be installed in advance.

pip install -r requirements.txt

Usage

The scripts used for data imputation, data preparation, feature extraction and model training & prediction are included in run.sh. Before running the scripts, please configure the paths in config.json.

sh run.sh

Checkpoints

The model checkpoints are included in the folder processed/checkpoints.

CheckpointsDescription
lgb_1+nr2_model_london.pklLondon model with Mahattan and normed Euclidean distance
lgb_1+nr2_model_madrid.pklMadrid model with Mahattan and normed Euclidean distance
lgb_1+nr2_model_melbourne.pklMelbourne model with Mahattan and normed Euclidean distance
lgb_1+p2_model_london.pklLondon model with Mahattan and Euclidean distance
lgb_1+p2_model_madrid.pklMadrid model with Mahattan and Euclidean distance
lgb_1+p2_model_melbourne.pklMelbourne model with Mahattan and Euclidean distance
lgb_full_missing_model_london.pklLondon model for samples with high missing rate

Feature Engineering

Prerequisites

The codes of feature engineering are included in the folder src/feature_extraction. Please note that, before running the codes within this folder to extract features, the scripts within the src/preparation folder should be run first to prepare all required inputs. Those scripts should be run as follows.

Static network features

See static_features.py.

Loop counts features

See loop_features_fully_missing.py.

Speed features

See speed_features_fully.py. Free flow speed and median speed of a SG is defined as the mean free flow speed and mean median speed of the edges involved. $k \in [1,2,5,10,50]$ below.

KNN label features

See knn_features_eng.py and knn_features_manipulate.py. $k \in [2,5,10,30,50,100]$ below.

Feature combination

We also combine (difference, addition, quotient) multiple aforementioned features together to construct more powerful features. This step is carried on in the model training script.

Report

The accompanying technique report can be found in Traffic4cast_2022_TSE.pdf.

Citation

@misc{tse-t4c22,
  title     = {Similarity-based Feature Extraction for Large-scale Sparse Traffic Forecasting},
  author    = {Wu, Xinhua and Lyu, Cheng and Lu, Qing-Long and Mahajan Vishal},
  year      = 2022,
  month     = {Oct},
  url       = {https://github.com/c-lyu/Traffic4Cast2022-TSE},
  language  = {en}
}

Acknowledgements

This repository is based on the official repository of the competition NeurIPS2022-traffic4cast.