Awesome
hafas-gtfs-rt-feed
Generate a GTFS Realtime (GTFS-RT) feed by polling a HAFAS endpoint and matching the data against a GTFS Static/Schedule dataset.
Architecture
hafas-gtfs-rt-feed
consists of 3 components, connected to each other via NATS Streaming channels:
monitor-hafas
: Given ahafas-client
instance, it useshafas-monitor-trips
to poll live data about all vehicles in the configured geographic area.match-with-gtfs
: Usesmatch-gtfs-rt-to-gtfs
to match this data against static GTFS data imported into a database.serve-as-gtfs-rt
: Usesgtfs-rt-differential-to-full-dataset
to aggregate the matched data into a single GTFS-RT feed, and serves the feed via HTTP.
monitor-hafas
sends data to match-with-gtfs
via two NATS Streaming channels trips
& movements
; match-with-gtfs
sends data to serve-as-gtfs-rt
via two channels matched-trips
& matched-movements
.
flowchart TB
subgraph external[ ]
hafas(HAFAS API):::external
db(GTFS Static/Schedule in PostgreSQL):::external
consumers(consumers):::external
classDef external fill:#ffd9c2,stroke:#ff8e62
end
style external fill:none,stroke:none
subgraph hafas-gtfs-rt-feed
monitor-hafas(monitor-hafas)
match-with-gtfs(match-with-gtfs)
serve-as-gtfs-rt(serve-as-gtfs-rt)
end
style hafas-gtfs-rt-feed fill:none,stroke:#9370db
subgraph nats[NATS Streaming]
trips[trips channel]:::channel
movements[movements channel]:::channel
matched-trips[matched-trips channel]:::channel
matched-movements[matched-movements channel]:::channel
classDef channel fill:#ffffde,stroke:#aaaa33
end
style nats fill:none
hafas-- realtime data -->monitor-hafas
db-- static data -->match-with-gtfs
serve-as-gtfs-rt-- GTFS-RT -->consumers
monitor-hafas .-> trips .-> match-with-gtfs
monitor-hafas .-> movements .-> match-with-gtfs
match-with-gtfs .-> matched-trips .-> serve-as-gtfs-rt
match-with-gtfs .-> matched-movements .-> serve-as-gtfs-rt
Getting Started
Some preparations are necessary for hafas-gtfs-rt-feed
to work. Let's get started!
Run npm init
inside a new directory to initialize an empty npm-based project.
mkdir my-hafas-based-gtfs-rt-feed
cd my-hafas-based-gtfs-rt-feed
npm init
set up NATS Streaming
Install and run the NATS Streaming Server as documented.
Note: If you run Nats Streaming on a different host or port (e.g. via Docker Compose), pass a custom NATS_STREAMING_URL
environment variable into all hafas-gtfs-rt-feed
components.
set up PostgreSQL
Make sure you have PostgreSQL >=14 installed and running (match-gtfs-rt-to-gtfs
, a dependency of this project, needs it). There are guides for many operating systems and environments available on the internet.
Note: If you run PostgreSQL on a different host or port, export the appropriate PG*
environment variables. The commands explain mentioned below will use them.
install hafas-gtfs-rt-feed
Use the npm CLI:
npm install hafas-gtfs-rt-feed
# added 153 packages in 12s
configure a hafas-client
instance
hafas-gtfs-rt-feed
is agnostic to the HAFAS API it pulls data from: To fetch data, monitor-hafas
just uses the hafas-client
you instantiate in a file, which queries one out of many available HAFAS API endpoints.
Set up hafas-client
as documented. A very basic example using the Deutsche Bahn (DB) endpoint looks as follows:
// db-hafas-client.js
const createHafasClient = require('hafas-client')
const dbProfile = require('hafas-client/p/db')
// please pick something meaningful, e.g. the URL of your GitHub repo
const userAgent = 'my-awesome-program'
// create hafas-client configured to use Deutsche Bahn's HAFAS API
const hafasClient = createHafasClient(dbProfile, userAgent)
module.exports = hafasClient
build the GTFS matching database
match-with-gtfs
β hafas-gtfs-rt-feed
's 2nd processing step β needs a pre-populated matching database in order to match data fetched from HAFAS against the GTFS Static/Schedule data; It uses gtfs-via-postgres
and match-gtfs-rt-to-gtfs
underneath to do this matching.
First, we're going to use gtfs-via-postgres
's gtfs-to-sql
command-line tool to import our GTFS data into PostgreSQL.
Note: Make sure you have an up-to-date static GTFS dataset, unzipped into individual .txt
files.
![TIP] The
sponge
command is from themoreutils
package.
# create a PostgreSQL database `gtfs`
psql -c 'create database gtfs'
# configure all subsequent commands to use it
export PGDATABASE=gtfs
# import all .txt files
node_modules/.bin/gtfs-to-sql -d -u path/to/gtfs/files/*.txt \
sponge | psql -b -v 'ON_ERROR_STOP=1'
You database gtfs
should contain the static GTFS data in a basic form now.
match-gtfs-rt-to-gtfs
works by matching HAFAS stops & lines against GTFS stops & lines, using their IDs and their names. Usually, HAFAS & GTFS stop/line names don't have the same format (e.g. Berlin Hbf
& S+U Berlin Hauptbahnhof
), so they need to be normalized.
You'll have to implement this normalization logic. A simplified (but very naive) normalization logic would look like this:
// hafas-config.js
module.exports = {
endpointName: 'some-hafas-api',
normalizeStopName: name => name.toLowerCase().replace(/\s+/g, ' ').trim(),
normalizeLineName: name => name.toLowerCase().replace(/\s+/g, ' ').trim(),
}
// gtfs-config.js
module.exports = {
endpointName: 'some-gtfs-feed',
normalizeStopName: name => name.toLowerCase().replace(/\s+St\.$/, ''),
normalizeLineName: name => name.toLowerCase(),
}
match-gtfs-rt-to-gtfs
needs some special matching indices in the database to work. Now that we have implemented the names normalization logic, we're going to pass it to match-gtfs-rt-to-gtfs
's build-gtfs-match-index
command-line tool:
# add matching indices to the `gtfs` database
node_modules/.bin/build-gtfs-match-index path/to/hafas-config.js path/to/gtfs-config.js \
sponge | psql -b -v 'ON_ERROR_STOP=1'
Note: hafas-gtfs-rt-feed
is data- & region-agnostic, so it depends on your HAFAS-endpoint-specific name normalization logic to match as many HAFAS trips/vehicles as possible against the GTFS data. Ideally, the stop/line names are normalized so well that HAFAS data can always be matched to the (static) GTFS data. This is how GTFS-RT feeds are intended to be consumed: along a (static) GTFS dataset with 100% matching IDs. If the name normalization logic doesn't handle all cases, the GTFS-RT feed will contain TripUpdate
s & VehiclePosition
s whose route_id
or trip_id
doesn't occur in the GTFS dataset.
run it
Now that we've set everything up, let's run all hafas-gtfs-rt-feed
components to check if they are working!
All three components need to be run in parallel, so just open three terminals to run them. Remember to set the NATS_STREAMING_URL
& PG*
environment variables (see above) in all three of them, if necessary.
They log pino-formatted log messages to stdout
, so for local development, we use pino-pretty
to make them more readable.
# specify the bounding box to be monitored (required)
export BBOX='{"north": 1.1, "west": 22.2, "south": 3.3, "east": 33.3}'
# start monitor-hafas
node_modules/.bin/monitor-hafas db-hafas-client.js | npx pino-pretty
# todo: sample logs
node_modules/.bin/match-with-gtfs | npx pino-pretty
# todo: sample logs
node_modules/.bin/serve-as-gtfs-rt | npx pino-pretty
inspect the feed
Your GTFS-RT feed should now be served at http://localhost:3000/
, and within a few moments, it should contain data! π
You can verify this using many available GTFS-RT tools; Here are two of them to quickly inspect the feed:
print-gtfs-rt-cli
is a command-line tool, use it withcurl
:curl 'http://localhost:3000/' -sf | print-gtfs-rt
.gtfs-rt-inspector
is a web app that can inspect any CORS-enabled GTFS-RT feed; Pastehttp://localhost:3000/
into the url field to inspect yours.
After monitor.js
has fetched some data from HAFAS, and after match.js
has matched it against the GTFS (or failed or timed out doing so), you should see TripUpdate
s & VehiclePosition
s.
Usage
metrics
All three components (monitor-hafas
, match-with-gtfs
, serve-as-gtfs-rt
) expose Prometheus-compatible metrics via HTTP. You can fetch and process them using e.g. Prometheus, VictoriaMetrics or the Grafana Agent.
As an example, we're going to inspect monitor-hafas
's metrics. Enable them by running it with an METRICS_SERVER_PORT=9323
environment variable and query its metrics via HTTP:
curl 'http://localhost:9323/metrics'
# HELP nats_streaming_sent_total nr. of messages published to NATS streaming
# TYPE nats_streaming_sent_total counter
nats_streaming_sent_total{channel="movements"} 1673
nats_streaming_sent_total{channel="trips"} 1162
# HELP hafas_reqs_total nr. of HAFAS requests
# TYPE hafas_reqs_total counter
hafas_reqs_total{call="radar"} 12
hafas_reqs_total{call="trip"} 1165
# HELP hafas_response_time_seconds HAFAS response time
# TYPE hafas_response_time_seconds summary
hafas_response_time_seconds{quantile="0.05",call="radar"} 1.0396666666666665
hafas_response_time_seconds{quantile="0.5",call="radar"} 3.8535000000000004
hafas_response_time_seconds{quantile="0.95",call="radar"} 6.833
hafas_response_time_seconds_sum{call="radar"} 338.22600000000006
hafas_response_time_seconds_count{call="radar"} 90
hafas_response_time_seconds{quantile="0.05",call="trip"} 2.4385
# β¦
# HELP tiles_fetched_total nr. of tiles fetched from HAFAS
# TYPE tiles_fetched_total counter
tiles_fetched_total 2
# HELP movements_fetched_total nr. of movements fetched from HAFAS
# TYPE movements_fetched_total counter
movements_fetched_total 362
# HELP fetch_all_movements_total how often all movements have been fetched
# TYPE fetch_all_movements_total counter
fetch_all_movements_total 1
# HELP fetch_all_movements_duration_seconds time that fetching all movements currently takes
# TYPE fetch_all_movements_duration_seconds gauge
fetch_all_movements_duration_seconds 2.4
health check
serve-as-gtfs-rt
exposes a health check that checks if there are any recent entities in the feed.
# healthy
curl 'http://localhost:3000/health' -I
# HTTP/1.1 200 OK
# β¦
# not healthy
curl 'http://localhost:3000/health' -I
# HTTP/1.1 503 Service Unavailable
# β¦
on-demand mode
Optionally, you can run your GTFS-RT feed in a demand-responsive mode, where it will only fetch data from HAFAS as long someone requests the GTFS-RT feed, which effectively reduces the long-term nr. of requests to HAFAS.
To understand how this works, remember that
- movements fetched from HAFAS are formatted as GTFS-RT
VehiclePosition
s. - trips fetched from HAFAS are formatted as GTFS-RT
TripUpdate
s. - the whole
monitor-hafas
,match-with-gtfs
&serve-as-gtfs-rt
setup works like a streaming pipeline.
The on-demand mode works like this:
monitor-hafas
is either just fetching movements (if you configured it to fetch only trips on demand) or completely idle (if you configured it to fetch both movements & trips on demand) by default.monitor-hafas
also subscribes to ademand
NATS Streaming channel, which serves as a communication channel forserve-as-gtfs-rt
to signal demand.- When the GTFS-RT feed is requested via HTTP,
serve-as-gtfs-rt
serves the current feed (which contains eitherVehiclePositions
s only, or no entities whatsoever, depending on the on-demand configuration).serve-as-gtfs-rt
signals demand via thedemand
channel.- Upon receiving a demand signal,
monitor-hafas
will start fetching trips β or both movements & trips, depending on the on-demand configuration.
This means that, after a first request(s) for the GTFS-RT feed signalling demand, it will take a bit of time until all data is served with subsequent GTFS-RT feed requests; As long as there is constant for the feed, the on-demand mode will behave as if it isn't turned on.
Tell serve-as-gtfs-rt
to signal demand via the --signal-demand
option. You can then configure monitor-hafas
's exact behaviour using the following options:
--movements-fetch-mode <mode>
Control when movements are fetched from HAFAS.
"on-demand":
Only fetch movements from HAFAS when the `serve-as-gtfs-rt` component
has signalled demand. Trips won't be fetched continuously anymore.
"continuously" (default):
Always fetch movements.
--movements-demand-duration <milliseconds>
With `--movements-fetch-mode "on-demand"`, when the `serve-as-gtfs-rt` component
has signalled demand, for how long shall movements be fetched?
Default: movements fetching interval (60s by default) * 5
--trips-fetch-mode <mode>
Control when trips are fetched from HAFAS.
"never":
Never fetch a movement's respective trip.
"on-demand":
Only fetch movements' respective trips from HAFAS when the `serve-as-gtfs-rt`
component has signalled demand.
"continuously" (default):
Always fetch each movement's respective trip.
--trips-demand-duration <milliseconds>
With `--trips-fetch-mode "on-demand"`, when the `serve-as-gtfs-rt` component
has signalled demand, for how long shall trips be fetched?
Default: movements fetching interval (60s by default) * 2
controlling the number of requests to HAFAS
Currently, there is no mechanism to influence the total rate of requests to HAFAS directly, no prioritisation between the "find trips in a bounding box" (hafas-client
's radar()
) and "refresh a trip" (hafas-client
's trip()
) requests, and no logic to efficiently use requests up to a certain configured limit.
However, there are some dials to influence the amount requests of both types:
- By defining a smaller or larger bounding box via the
BBOX
environment variable, you can control the total number of monitored trips, and thus the rate of requests. - By setting
FETCH_TILES_INTERVAL
, you can choose how often the bounding box (or the vehicles within, rather) shall be refreshed, and subsequently how often each trip will be fetched if you have configured that. Note that if a refresh takes longer to than the configured interval, another refresh will follow right after, but the total rate ofradar()
requests to HAFAS will be lower. - You can throttle the total number of requests to HAFAS by throttling
hafas-client
, but depending on the rate you configure, this might cause the refresh of all monitored trips (as well as finding new trips to monitor) to take longer than configured usingFETCH_TRIPS_INTERVAL
, so consider it as a secondary tool.
exposing feed metadata
If you pass metadata about the GTFS-Static feed used, serve-as-gtfs-rt
will expose it via HTTP:
serve-as-gtfs-rt \
--feed-info path/to/gtfs/files/feed_info.txt \
--feed-url https://data.ndovloket.nl/flixbus/flixbus-eu.zip
curl 'http://localhost:3000/feed_info.csv'
# feed_publisher_name,feed_publisher_url,feed_lang,feed_start_date,feed_end_date,feed_version
# openOV,http://openov.nl,en,20210108,20210221,20210108
curl 'http://localhost:3000/feed_info.csv' -I
# HTTP/1.1 302 Found
# location: https://data.ndovloket.nl/flixbus/flixbus-eu.zip
Related projects
hafas-gtfs-rt-server-example
β Usinghafas-client
,hafas-monitor-trips
&hafas-gtfs-rt-feed
as a GTFS-RT server.print-gtfs-rt-cli
β Read a GTFS Realtime (GTFS-RT) feed fromstdin
, print human-readable or as JSON.gtfs-rt-inspector
β Web app to inspect & analyze any CORS-enabled GTFS Realtime (GTFS-RT) feed.match-gtfs-rt-to-gtfs
β Match realtime transit data (e.g. from GTFS Realtime) with GTFS Static data, even if they don't share an ID.gtfs-rt-differential-to-full-dataset
β Transform a continuous GTFS Realtime stream ofDIFFERENTIAL
incrementality data into aFULL_DATASET
dump.transloc-to-gtfs-real-time
β Transform Transloc Real Time API to the GTFS RealTime Format
There are several projects making use of hafas-gtfs-rt-server
.
License
This project is dual-licensed: My contributions are licensed under the Prosperity Public License, contributions of other people are licensed as Apache 2.0.
This license allows you to use and share this software for noncommercial purposes for free and to try this software for commercial purposes for thirty days.
Personal use for research, experiment, and testing for the benefit of public knowledge, personal study, private entertainment, hobby projects, amateur pursuits, or religious observance, without any anticipated commercial application, doesnβt count as use for a commercial purpose.
Get in touch with me to buy a commercial license or read more about why I sell private licenses for my projects.
Contributing
By contributing, you agree to release your modifications under the Apache 2.0 license.