Skip to content
Nate Wessel edited this page May 13, 2019 · 7 revisions

What it is

The goal of this project is to use archived automatic vehicle location (AVL) data to create a 'retrospective' GTFS 'schedule' describing an agency's observed (as opposed to scheduled) transit service. AVL data comes from real-time APIs such as the NextBus API or GTFS real-time, which are used by quite a few different agencies at the moment.

This project was developed as part of my dissertation on transit accessibility modelling.

How the code Works

The API (NextBus or GTFS-real-time) reports all fleet vehicle locations which have updated since some given time. We ping the API about every 10 seconds requesting at t2 all updates since t1. These are stored in a PostGIS database. Vehicle location reports are partitioned into distinct, ordered trips and blocks. A trip is an ordered sequence of vehicle reports, basically a GPS trace. These would get very long and go back and forth a lot except that we break sequences into new trips when a vehicle does one of the following:

  • Goes off the radar for more than some amount of time
  • Changes it's headsign
  • Changes it's route_id

Blocks are sequences of consecutive trips. A new block is started only if a vehicle fails to report it's location in a timely manner. Once a trip ends, it is sent off for processing. First the trip is cleaned and simplified by removing redundant, co-located points at the start or end. These mostly come about because of long in-station dwell-times. Next, the trip is map-matched using OSRM and data from OpenStreetMap. The data we've been working with have a 20-second delay between location updates and map-matching lets us estimate a more realistic route geometry.

We get schedule data for each route from the NextBus API. From this we find the set of stops which are expected for the given route_id and headsign/direction. We now have a trip path with points in space and in time (from the location report time), and a set of stops. Any stop within x meters of the trip geometry is 'snapped' to the nearest point on the line and the time of that point is interpolated from the nearest useable vehicle location timestamps.

Stop times and trips are stored in the PostGIS DB, from whence they can be extracted in the form of regular GTFS dataset with included scripts.

Clone this wiki locally