Awesome
GTFS
Public transportation schedules and associated geographic information for South-East Queensland.
The data is a snapshot and not planned to be kept up-to-date. The main purpose of this repository is to develop a data package and schemas for this dataset.
Data
This data is the General transit feed specification (GTFS) — South East Queensland data published by Transport and Main Roads, Queensland Government, licensed under Creative Commons Attribution sourced on 07 September 2016.
The data follows the GTFS specification and some of its extensions that define a common format for public transportation schedules and associated geographic information. The specification allows some files to be optional. It also allows some columns in the files to be optional. This means that the datapackage.json file and schemas may not work for other GTFS files.
The data is made up of a number of files.
Each data file is defined by a schema. The schemas follow the json table schema specification.
These schemas will be combined into a datapackage.json file to fully describe the data collection. The datapackage.json file will follow the data package specification.
Preparation
The data was downloaded, unzipped, and then uploaded to GitHub.
Two data files (shapes.txt and trips.txt) were too large to load into GitHub. They were truncated and uploaded. They will be adequate to use for testing valid data.
Tests
The focus of the tests is to ensure the schemas are correct. There are already GTFS data validation tools to test the data in more powerful ways than json table schemas allow.
The tests are invalid data that is used to ensure the schema detects all errors (e.g. incorrect types and violated constraints).
Results
The results can be verified using links to Good Tables. Tests include:
- testing the valid data without a schema
- testing the valid data with a schema
- testing the invalid data with a schema
Good Tables doesn't check all types of errors (yet). Somethings not checked include:
Automatic Testing
The scripts and .travis.yml file are used to automatically test the data that is defined in datapackage.json. Whenever there is a change to this repository, it triggers Travis to validate the data.
The last automatic test returned
Schemas
The schemas were created using Data Packagist. Using Data Packagist:
- add some basic information about the data file (name, description, license, etc.)
- upload the data file
Data Packagist will create a datapackage.json file for you. Download this file.
Good Tables can only use a json table schema for validation (see goodtables-web #65). You can extract the json table schema from the datapackage.json file. It's this bit {fields: [...]}
. Save this a separate file.
Edit the schema file with a text editor (e.g. ATOM, jsoneditoronline.org) and add constraints, refine types and formats, etc. You may like to use the json table schema schema to improve your editing experience.
Some constraints use regular expressions to define a pattern. Use a online tool to help create and test a regular expresion e.g. regexr.com or regex101.
View the Data Package
Data packages are about providing machine-readable metadata for your data. You can view a human-readable version of the data package data, and readme files using the Data Package Viewer. There are a couple of issues with the viewer including providing an incorrect link to the metadata data.okfn.org-new #9.
License
All items in this repository, apart from the data, are licensed under Creative Commons Attribution 4.0.