Awesome
NTDS-Team24 : Analysis of delays on the New Jersey railway network
This is the GitHub repository for the final project for the course EE558 - A Network Tour of Data Science (EPFL).
The project focuses on the analysis of train delays on the New Jersey railway network. The dataset is available in Kaggle (https://www.kaggle.com/pranavbadami/nj-transit-amtrak-nec-performance). Due to the huge amount of data avaiable, we choose to focus mainly on data from March 2018.
Repository structure
The results (figures and gifs) are stored in Images and Gifs folder.
There are 4 notebooks in total:
- Data exploration and processing for the network analysis : Preprocessing_def
- Clustering : Clusters_original
- ML Model 1 (RNN): Final_LSTM_Model
- ML Model 2 (ANN) and 3 (Ridge): Classification_Model
The data for the months of March and June 2018 are available on the main page, as well as the March data splitted into inward and outward trips (going to or coming from New York). The coordinates of the stations of the network are also available.