Home

Awesome

frequent-finder

A program for auto-generating frequent network maps of public transit systems, using GTFS data as input.

There are two basic pieces to the puzzle. One is taking the GTFS feed and producing a GeoJSON file of the frequent services. The other is taking that GeoJSON and creating a map. I'm interested in three options for creating maps:

The two parts of the process

See my website for links to more of my work.

<br>

Current status

FrequentFinder works!

Here's Spokane's official transit system map, with frequent services in red:

Screenshot of Spokane transit map

Here's my GeoJSON export from FrequentFinder, displayed in QGIS:

Screenshot of FrequentFinder's output for Spokane

Note that my map has fewer frequent services shown because for this one I set frequency parameters that are more strict than those used by the transit authority. (Also, see those short red segments? Those are places where bus lines overlap to create a short frequent corrider. Yes, FrequentFinder handles those easily.)

Currently, my main focus is on improving the core data processing component of FrequentFinder. Later, I hope to add the components for generating interactive Mapbox maps and SVGs from the GeoJSON export; those are not my first priority right now because the GeoJSON export can already be visualized using QGIS.

<br>

Usage

Note: These instructions are intended for someone who is "data literate" and feels comfortable working with things that look code-like, but who may not necessarily be familiar with Python.

There are several ways you can run frequent-finder.py, but here is one. It requires having Python 2.7 and QGIS installed. This should take about 5-10 minutes if you're doing it for the first time.

1. Download the GTFS files for your city of interest. You should be able to find them at the GTFS Data Exchange, but if you want it to feel more "official" you can go to your transit agency's website and see if they have the data there.

2. Create a folder and copy frequent-finder.py to the folder. Rename it to ff.py. (If you don't have GitHub installed on your computer, you can also just go to the file, copy its contents, paste them into a blank text document using a simple text editor, and name it ff.py. Make sure there's no .txt extension at the end of the name after you save it.)

3. Extract all the GTFS files you downloaded. I recommend placing them in a folder with your city as its name, as a sub-folder of a "data" folder, as a sub-folder of the original folder (containing ff.py). See the directory structure of this repo for an example.

4. Create a ff_config.json file in your folder that has the GTFS data. The basic format should look like this. Actually, I suggest copying the contents of that file to use as a template. This file is structured as one large array, with each object in the array being a frequency category. The sample values listed should give you a pretty good idea of what kind of input is expected. Specifically:

5. Go to your favorite place to run Python. This could be the terminal, IDLE, or wherever. (For me, it's SublimeREPL.) Type the following commands:

import os
os.chdir("C:/your-folder")

Of course, your-folder should be the extension for your folder that has ff.py in it. If you want to verify that you're in the right place now, type os.getcwd() and you should see the correct folder path. Now type:

import ff
freq = ff.System("data/your-city/", 20151001)

Note that in the second line, the string argument is the path for the folder containing your GTFS data, with data/your-city/ being the suggested location, replacing your-city with the city name. The second argument to System is a number representing a date in YYYYMMDD format. The program will take this date and find the appropriate set of schedules for the time period the date falls within.

After you enter the second line, you should see a bunch of messages indicating files were opened and closed. Then enter:

freq.saveGeoJSON("data/your-city/frequency.geojson")

The argument to saveGeoJSON should be the location/name of your output GeoJSON file.

6. Open up QGIS. Go to add a vector layer (Layer > Add Layer > Add Vector Layer). In the popup box, click Browse and find the GeoJSON file. Click Open. You should now see a map that looks like your transit system.

7. In the list of layers (probably on the left side of the screen), double-click the layer name. On the left side of the popup box, click Style. At the top of the main panel, click the drop-down that says Single Symbol and change it to Categorized. Toward the bottom of the box, click the Classify button. The list of frequency categories (that applied to this data set) should be above. The blank category is for corridors that didn't meet any frequency standard. If you double-click on a symbol, you can change its appearance. I recommend thicker lines for more frequent services and the color red for your highest frequency category. When you're done, click OK at the bottom of the layer properties box. Voila! Your frequent network map has arrived.

[I will add more in-depth instructions with screenshots for people with little coding/QGIS experience.]

<br>

Unexpected results

If your resulting frequent network map is not what you envisioned, there could be several reasons.

One is that there's a problem with the data. Remember the good old saying, "garbage in, garbage out." FrequentFinder can only work its magic on what it was given. Hopefully, though, this isn't the issue.

A larger likely culprit is start and end times. FrequentFinder applies the same frequency standard all day from the given start_time to the given end_time. It does not discriminate as to whether frequency standard violations are in the middle of the day or at the beginning/end, as standards are standards. Try setting a later start_time and earlier end_time and see if your results change. Also see if your transit agency holds Sunday schedules to the frequency standards.

It could also be that your transit agency is exaggerating its claims. For example, a supposed "15-minute map" might really be a map of services that come four times an hour (e.g. departures at 9:00, 9:10, 9:30, and 9:40). FrequentFinder will sniff this out, even if your agency isn't being upfront about it. Alternatively, it could be that a route/segment often adheres to a frequency standard, but there are more "errors" than you have allowed for in your ff_config.json file. And note that a single violation of the error_mins allowance will disqualify a service from a given frequency category.

Finally, there could be an issue with FrequentFinder. Please let me know if you think this is the case! Due to the complexity of the program and the transit data that's being fed to it, it's hard to say for sure whether your results are accurate or not. I think they should be, but I make no promises. I haven't yet written any test cases of complex systems because that would require going through all the schedules by hand to try to figure out what the results "should" be, and that's not a high priority right now, given how long that would take due to FrequentFinder's precision.

Note: Most cities I have tried so far have worked fine, but for some reason San Francisco seems to not be classifying anything in either of my two highest frequency categories. That said, it is still separating all segments into one of two categories, and the lowest-frequency services are correctly being put into the lower category. I need to investigate whether this is accurate (meaning, the problem is with the data itself or the schedule) or if there's a problem with FrequentFinder. I think/hope it's the former, but I'm not sure.

<br>

Issues and challenges

One of the big challenges of this project is its relationship with the source data. GTFS data is great for "specific" transit information: figuring out what service is available at an exact time and place on an exact day in time. What I'm trying to do is the opposite: create a generalized picture of the transit system. This sparks many questions about how exactly to do this:

All of these issues are present in real-world transit systems. They present serious challenges to any attempt at automating frequent network determination. FrequentFinder needs to be able to deal with them. And it does. That's part of what makes it different from a much simpler frequency-determination program.

<br>

Motivation

Jarrett Walker, author of the blog and book Human Transit, is 100% responsible for inspiring this. He is probably the world's most famous promoter of frequent network mapping, and more generally is the best voice in urban transit I know of. If you don't already read his blog, you really, really, really should.

<br>

FAQ

Hasn't someone done this before?

Yes and no.

Others have certainly thought of the idea. I've seen plenty of people express a desire for something like this. But so far, my web searches have not been very fruitful.

Conveyal has many tools that are useful for analyzing transit, and I haven't yet had time to dig through them all, but at a glance it doesn't seem there's anything that quite does this.

If you search GitHub for "transit frequent network map," this repo is the only result.

The most promising example seems to be an example made by David Marcus at Routefriend. But it appears as if not much has been done since this map. The Routefriend website has a page with a drop-down menu for selecting one of several cities, but I haven't had any luck with it; the page says "Loading (this may take a sec)" but no lines ever appear on the map. Perhaps it once worked but no longer does... or maybe it takes a really long time.

So, no, I haven't found any example that does exactly what I want. Hence my motivation for making FrequentFinder!

There are a few key features about my approach:

Is this supposed to be a replacement for "hand-drawn" frequent network maps?

Absolutely not. I feel strongly that custom-drawn, abstract transit maps are superior to auto-generated ones. (Relevant Human Transit blog post.) But making the former takes much more time and effort than the latter. FrequentFinder is intended to do a few things that custom-drawn maps can't. Thanks to the fact that they're auto-generated, these maps can:

I want to highlight a Human Transit post comment by Alon Levy: "When I made my New York maps, the main challenge was to collect the set of frequent buses. Drawing was tedious, and I had to make some simplifications for routes that have short one-way pair segments, but it only has to be done once. Ideally, a transit agency would have a customizable map, allowing you to check boxes for all buses you want to see. That would also take care of questions like 10 versus 15 minutes, or [8am-7pm] versus [6am-9pm]." Alon, say hello to FrequentFinder! (Alon's blog, Pedestrian Observations, is a must-read.)

So no, I don't see this as an end in itself.

That said, as I continute to develop FrequentFinder, there are certainly features I could add that would enhance its smartness and abstractness:

Why is there no graphical user interface (GUI)?

Because developing a GUI takes time, and for now I'd rather devote that time and energy toward improving the core capabilities of FrequentFinder. While a GUI would be nice, I think it's far from necessary as the directions for how to use FrequentFinder aren't that complicated.

As it stands, FrequentFinder is not a web app or smartphone app. The reason is simple: it doesn't yet offer the speed for those kinds of uses, and frankly those uses don't matter that much right now. The point of a frequent network map is that it's widely usable. Unlike getting in-the-moment transit directions, this one map is useful for the span of service it applies to (i.e. what hours of the day it's true for). So it wouldn't make any sense to keep generating a new one for the same city unless there's a service change. The point isn't so that you can whip out your smartphone and see the map instantly, though you could certainly do that if you've already generated it; instead, the focus is on building a map that's broadly applicable to many situations, times of day, and days of the week.

In any case, when it comes to navigating a transit system (or just staring at a transit map so you can absort its information), I would strongly suggest looking at a more abstract, custom-drawn one, not these auto-generated ones (though they're much better than nothing!).

How's the speed?

This really depends. I've gotten results for Spokane in around 10 seconds, while San Francisco's (SFMTA) took about a minute and a half. The key issues here are how much data there is (particularly in the stop_times.txt file) and what the patterns in the data are.

Regardless, this doesn't seem like a big deal right now. For starters, FrequentFinder isn't really about blazing speed, since it's not intended to be an in-the-moment service. The current runtime is fine.

That said, after I've added more core functionality, I will look as possibilities for optimization.

Why did you choose Python and JavaScript?

Initially I was actually going to use JavaScript for everything, taking advantage of some of Underscore's features to help out. The idea was to use "one language for everything" (similar to part of why Node.js is gaining popularity), and I knew I was going to need JavaScript for Mapbox and D3.

Then... I realized that I'm much more used to doing these kinds of operations (file I/O, heavy computation, graph/network modeling) in Python. So I switched to Python.

Are Python and JavaScript the right tools? Well JavaScript is the right tool for Mapbox and D3. But what about the Python part? Could that be in another language? Sure. I'm actually quite interested in the idea if trying to write that part in another language, mostly for fun. But for now, the point is to write it with a language that I know and is suitable for the task, and Python meets those criteria. (Also, since so many people know Python, it makes it easier for others to comment/contribute.) Rewriting it in another language is not even on my radar screen right now.