Awesome
Route Trends: Upload, Analyze, Forecast Ridership by Route
Authors
Creator: Joel Huting
Authors: Joey Reid, Kim Eng Ky, Eric Lind
Sponsoring Institution: Metro Transit, Minneapolis-St. Paul, MN USA
Purpose
R Shiny app to ingest ridership time series, and return:
- seasonal, trend, and residual components according to STL methodology
- forecasts including uncertainty based on those components
The main purpose of this app is to facilitate use of the timeseries methodology by transit agencies interested in better understanding their ridership data. The descriptions and examples are thus focused on counts of riders, by route, on a monthly basis, matching required submissions to the National Transit Database where many such timeseries can be found.
Output
This code is provided to allow for collaborative improvement and modification of the app, and to enable local implementation. To simply use the route trends app, you can go to the Metro Transit shinyapps.io page.
Methods
R, R Studio, and Shiny
This repository contains code to be run in RStudio, the IDE built for the statistical language R. The code includes R packages developed by others (see list of packages at the end of this README), including Shiny, a package to build interactive web apps in html and java, from within the R environment.
Data format
The app requires a datafile in CSV with the following columns, in this order:
- Date in '%m/%d/%y' format (e.g. 5/1/17)
- Ridership in numeric or integer format
- Identifier (route, mode, route type) in string format
Analysis methods
Trends are calculated using "Seasonal-Trend Decomposition Procedure Based on Loess (STL)" STL Decomposition is a filtering procedure for decomposing a seasonal time series into three compenents:
-
trend: this is typically of most interest to transit agencies. what is the long-term trend of a given route?
-
seasonal: this is of interest but typically known by transit agencies. Ridership may be typically higher in fall than mid-summer; routes serving university campuses may rise and fall with the academic calendar. By incorporating these regularities into the time series analsyis, the trend can be better understood independent of the seasonality. This represents one key advance over year-over-year same-month comparisons, which are industry standard practice for dealing with the seasonality of ridership.
-
remainder: remainders which are small indicate random variation around the trend given the season (month). Remainders which are high, and remainders which are strongly in one direction for multiple months in a row, may indicate an inflection in ridership or other non-stationarity.
Approaches for extracting STL components:
In this app, we include six different forecasting methods. Each method has its strengths and may fit better to a given timeseries. Note the timeseries length requirements can differ among approaches, but all generally require at least two years or 25 months of data to be estimated.
- Autoregressive Integrated Moving Average (ARIMA): needs at least 24 monthly observations
- STL using ARIMA: needs at least 25 monthly observations
- Exponential Smoothing State Space (ETS): needs at least 24 monthly observations
- STL using ETS: needs at least 25 monthly observations
- Exponential Smoothing State Space model with Box-Cox Tranformation, ARMA errors, Trend and Seasonal Components (TBATS): needs at least 24 monthly observations
- Neural Network Time Series (NNETAR): need at least 25 monthly observations
- Hybrid forecasts: model average of ETS, NNETAR, STL using ARIMA, and TBATS: need at least 49 monthly observations, as model weights are determined by cross-validated root mean square error (RMSE)
Deciding on best-fit models for your timeseries
The simplest determinant of the accuracy of forecasting models is in-sample mean absolute percentage error (MAPE).MAPE is the average (mean) percentage difference between actual and predicted values. Small MAPE is preferable, and can be compared across different models forecasting the same time series.
R Packages Used
- data.table read, write, join and aggregate data
- DT interactive data table displays
- dygraphs flexible interactive timeseries plotting
- forecast timeseries modeling and forecasting
- forecastHybrid ensemble and model averaging for timeseries models
- ggplot2 static graphics
- lubridate parsing of date strings
- scales effective data breaks and labels
- shiny interactive website
- shinydashboard app theme