Awesome
transit-feed-quality-calculator
A project that uses the gtfs-realtime-validator to assess the quality of a large number of transit feeds.
This tool:
- Fetches the URLs for GTFS-realtime feeds and corresponding GTFS data from either the TransitFeeds.com GetFeeds API or a specified
.csv
file, and downloads them from each agency's server into a subdirectory - Runs the gtfs-realtime-validator Batch Processor on each of the subdirectories
- Produces summary statistics and graphs, such as:
Read more in this Medium article.
Running the application
You'll need JDK 7 or higher.
This project was created in IntelliJ. You can also compile it from the command line using Maven.
If you're downloading GTFS or GTFS-rt from secure HTTPS URLs, you may need to install the Java Cryptography Extension (JCE). You will need to replace the US_export_policy.jar
and local_policy.jar
files in your JVM /security
directory, such as C:\Program Files\Java\jdk1.8.0_73\jre\lib\security
, with the JAR files in the JCE Extension download. Alternately, you can add -Djsse.enableSNIExtension=false
to the command line when running the application.
To download feeds, you'll also need a TransitFeeds.com API key or a .csv
file that includes feed information (see below).
Command line
mvn package
java -Djsse.enableSNIExtension=false -jar target/transit-feed-quality-calculator-1.0.0-SNAPSHOT.jar -directory output -transitFeedsApiKey 1234567689 -csv feeds.csv
Note that to download feeds, you'll need to provide an API key for TransitFeeds.com or a .csv
file that includes feed information.
See the below command-line options section for a description.
IntelliJ
Run the Main.main() method, and provide the command-line options via the "Run configurations->Program arguments" feature.
Command line options
-directory "output"
- Required - The directory to which feeds will be downloaded (in this caseoutput
), and to which validation and analysis files will be output-transitFeedsApiKey YOUR_API_KEY
- (Optional) - Your TransitFeeds.com API key (in this case,YOUR_API_KEY
)-csv "feeds.csv"
- (Optional) - A CSV file holding feed information (in this case,feeds.csv
- you can name it whatever you want)-forceGtfsDownload false
- (Optional) - Iffalse
, if there is already a GTFS file on disk for a feed it will not download a new GTFS file. Iftrue
or if the command-line option is omitted, then a new GTFS file will always be downloaded and overwrite any current GTFS file for each feed.-errorsToIgnore "E017,E018"
- (Optional) - A comma-delimited list of errors to ignore when calculating summary error results and generating the Excel file. By default errors that examine sequential feed iterations (E017, E018
) are ignored (as archived files may not have been collected iteratively) (seeTransitFeedQualityCalculator.java
, but setting a value via the command-line parameter will overwrite the default value.-warningsToIgnore "W007,W008"
- (Optional) - A comma-delimited list of warnings to ignore when calculating summary warnings results and generating the Excel file. By default warnings that examine sequential feed iterations (W007, W008
) are ignored (as archived files may not have been collected iteratively) (seeTransitFeedQualityCalculator.java
, but setting a value via the command-line parameter will overwrite the default value.
If you want to download feeds, either -transitFeedsApiKey
or -csv
parameters must be provided. If these are missing, this tool will proceed to validate and analyze the feeds currently in -directory
without downloading any new files.
The feeds.csv
file should be formatted as follows:
region_id,title,gtfs_url,gtfs_rt_url
"10000-Portland, OR, USA","TriMet Trip Update",https://developer.trimet.org/schedule/gtfs.zip,http://developer.trimet.org/ws/V1/TripUpdate&appID=225D5601E7729B9ED863DCA39
"10000-Portland, OR, USA","TriMet Alerts",https://developer.trimet.org/schedule/gtfs.zip,http://developer.trimet.org/ws/V1/FeedSpecAlerts&appID=225D5601E7729B9ED863DCA39
"20000-Oakland, CA, USA","AC Transit Trip Update",http://www.actransit.org/wp-content/uploads/GTFSWinter17B.zip,http://api.actransit.org/transit/gtfsrt/tripupdates?token=9A6257A021F944E7BE0AD32702DF23CE
Tips:
region_id
should follow the format of10000-Portland, OR, USA
- a-
should separate the ID from the region name. Theregion_id
field will be the name of the subdirectory under-directory
in which feed files will be saved. We recommend prefixing it with a large integer value following the region pattern of TransitFeeds.com, to avoid collisions with downloads from TransitFeeds.com.- If you have more than one GTFS-rt feed (e.g., VehiclePositions and TripUpdates), use the same
region_id
for each. This way the GTFS data will only get downloaded once for that feed, and both GTFS-rt feeds will be downloaded to the same directory. - The
title
field will be the file name of the downloaded protocol buffer file gtfs_url
andgtfs_url_url
can contain API keys if needed (e.g.,http://developer.trimet.org/ws/V1/TripUpdate&appID=1234567890
)- Be sure to surrounding any fields that contains spaces with
"
Sample output
You'll see a lot of folders within the output
directory, one for each transit agency:
If you look in one of those folders, you'll see the following:
This contains the GTFS and GTFS-realtime source files downloaded from the agency:
- gtfs-zip - The GTFS data that was downloaded from the agency URL (HART, in this case) provided by TransitFeeds.com API
- HART Trip Updates-xxxx.pb - The TripUpdates binary Protocol Buffer file that was downloaded from the agency URL (HART, in this case) provided by TransitFeeds.com API, with the UTC time in milliseconds appended
- HART Vehicle Positions-xxxx.pb - The VehiclePositions binary Protocol Buffer file that was downloaded from the agency URL (HART, in this case) provided by TransitFeeds.com API, with the UTC time in milliseconds appended
...as well as plain text versions of the GTFS-realtime files generated by the gtfs-realtime-validator:
- HART Trip Updates-xxxx.pb.txt - The plain text version of the above TripUpdates binary
- HART Vehicle Positions-xxxx.pb.txt - The plain text version of the above VehiclePositions binary
...and the validation results for each GTFS-realtime file (see gtfs-realtime-validator Batch Processor output examples for details):
- HART Trip Updates-xxxx.results.json - The validation results for the above TripUpdates binary
- HART Vehicle Positions-xxxx.results.json - The validation results for the above VehiclePositions binary
An Excel spreadsheet file analysis-graphs.xlsx
will be generated in the root folder of the project that contains graphs that summarize all of the analyzed GTFS-realtime feeds - for example:
The analysis results are also output to a JSON file, analysis-summary.json
.
Implementation details
Take a look at the Main.main() method.
Here's a simplified version of what it looks like:
String directoryName = "your-directory";
String transitFeedsApiKey = "YOUR_TRANSIT_FEEDS.COM_API_HERE";
String csvFile = "feed-file.csv";
TransitFeedQualityCalculator calculator = new TransitFeedQualityCalculator(Paths.get(directoryName));
if (transitFeedsApiKey != null) {
calculator.setTransitFeedsApiKey(transitFeedsApiKey);
}
if (csvFile != null) {
calculator.setCsvDownloaderFile(csvFile);
}
calculator.calculate();
This demonstrates the usage of the TransitFeedQualityCalculator
, which performs the following steps:
- Download - Via
TransitFeedsDownloader
andCsvDownloader
- Validate - Via
BulkFeedValidator
- Analyze - Via
ResultsAnalyzer
- Export - To Excel file via
ExcelExporter
to JSON file via Jackson
Dependencies
Managed via Maven:
- TransitFeeds.com Client Library - For calling the TransitFeeds.com GetFeeds API
- GTFS-realtime Validator - For identifying warnings and errors in GTFS-realtime feeds