Skip to content
abyrd edited this page Jul 27, 2012 · 10 revisions

Introduction

The OpenTripPlanner Analyst web services produce map tiles or rasters derived from a single shortest path tree, which is to say that they reveal information (often travel time) about the geographic area covered by a graph from the perspective of a single (but freely movable) origin point.

The OpenTripPlanner Analyst batch framework covers a larger range of use cases, which may involve accumulating or aggregating results derived from many independently built single-source shortest path trees. Rather than a web service that serves map tiles, the batch framework includes a command-line program that is configured via Spring dependency injection XML. The locations and other attributes of path search endpoints (origins and destinations) may be loaded from Shapefile, CSV, or raster formats. Results may be saved back to a raster or a CSV file for further manipulation or analysis in a desktop GIS package like QGIS or a statistics package like R.

Batch Analyst is not a web servlet, but a plain Java main class: org.opentripplanner.analyst.batch.BatchProcessor. The simplest way to work with it is to configure your IDE to run it (e.g. using the "run configurations" dialog in Eclipse). It would be equally possible to invoke it from the command line but you'd have to include all the Maven dependencies on the classpath. As with the OTP web services, be sure to give the Java virtual machine plenty of heap space, especially when using large graphs. Maximum heap size is controlled with the -Xmx parameter, and typically needs to be on the order of a few gigabytes for smooth operation with medium-sized graphs.

The BatchProcessor will read load its Spring application context from the XML configuration in src/main/resources/batch-context.xml, or alternatively a configuration file specified on the command line. The XML in this file declaratively describes the configuration and instantiation of Spring "beans", which will both provide the processing logic and represent the data sets you are working with. The processing logic components will generally remain the same from one execution to another, but you will need to edit and/or rearrange the beans that furnish the origin set, destination set, and aggregate function to fit your specific use case.

Origin and destination populations

The origin and destination set objects must implement the Population interface, which is to say that they must be iterable collections of Individuals that retain some information about their structure (grid or scattered), source format, and CRS. Each Individual has a location (latitude and longitude) and a user-defined input value, all of which are double-precision floating point values. Population implementations are provided that load individuals from a flat comma-separated file, an ESRI Shapefile, or a georeferenced raster (image) file. It is also possible to manually build up a Population element by element via Spring properties, or to generate a Population of Individuals on a regular grid by specifying its width, height, CRS, and envelope.

A Batch Analyst run is always carried out as a loop over the source population, building a shortest path tree for each Individual. At each iteration, it finds States in the shortest path tree near each destination Individual and records some derived information such as travel time or number of transfers in a result set. This result set has the same size and ordering as the destination Population, and can therefore be saved in a format similar to that of the destination Population. Associations between destination Individuals and graph vertices are cached (as in Analyst template tiles) to speed up the production of result sets.

This is effectively finding a shortest path for every pair in the Cartesian product of the origin and destination sets and recording some information about each of those paths. By exploiting the fact that a single shortest path tree can be reused for all paths with the same origin, run time is greatly reduced relative to a naïve iteration over all O/D pairs, since we need only build |O| instead of |O||D| shortest path trees.

Aggregation

Additional processing may occur depending on whether an Aggregator or Accumulator has been configured. There are three possibilities:

  1. No Aggregator or Accumulator was specified. In this case, one set of results is produced for each Individual in the origin set (i.e. one per shortest path tree). The number of output files will be equal to the number of origin Individuals; the format and dimensions of each output file will correspond to the destination Population. Each line, feature, or pixel in the output files will correspond to an Individual in the destination Population, and will have a value derived from the dominant States near that Individual, as well as the input attribute loaded into that Individual. For O origins and D destinations, O output objects will be produced, each containing D data elements.

  2. An Aggregator was specified. For each element of the origin Population, the Aggregator examines the result set and reduces it to a single double-precision floating point value, generally by combining it with the corresponding destination Individuals' input attributes. This aggregate value is stored in a result set whose characteristics and size correspond to the origin set, and is associated with the Individual whose shortest path tree it summarizes. The aggregate result set persists throughout the run, and is progressively filled in by the series of SSSP searches and aggregate function evaluations. When one aggregate value has been calculated for each origin Individual, the aggregate result set is saved in an appropriate format. Thus, there is a single output file which corresponds in type and dimension to the origin Population. For example, if the origins were loaded from a raster or arranged on a grid, the output would be a single GeoTIFF image file with a single float band. Each pixel in this raster would contain the aggregate value for the shortest path tree built from that location, evaluated with respect to the (unchanging) destination Population. For O origins and D destinations, 1 output object will be produced, containing O data elements.

  3. An Accumulator was specified. For each element of the origin Population, the Accumulator examines the result set and the destination Individuals, and may then modify another result set that persists over all iterations. The persistant, accumulated result set has the same size and structure as each iteration's result set, which is to say it has the same size and structure as the destination Population. Once the entire origin Population has been visited, the Accumulator can optionally perform a some operations on the accumulated result set (e.g. a final division step when calculating a mean). Again, there is a single output file, but here it corresponds in size, order, and type to the destination Population. Most metrics that can be expressed with an Accumulator could also be expressed with an Aggregator, but one of the two formulations may be significantly faster than the other where the origin and destination populations are of different sizes. By expressing the problem such that the origin Population is the smaller one, the number of time-consuming SSSP searches is reduced. As an example, consider the case where one of the populations is a high-resolution raster. For O origins and D destinations, 1 output object will be produced, containing D data elements.

XML-based Configuration

Properties of the BatchProcessor itself allow the user to customize the search process, and a PrototypeRoutingRequest may be provided to set OTP routing parameters such as mode of transport or maximum walk distance (which would be specified in the OTP query string when using the REST routing service).

Basic Example

Here is an example batch-context.xml for a bare-bones Batch Analyst run:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:context="http://www.springframework.org/schema/context" xmlns:aop="http://www.springframework.org/schema/aop"
    xmlns:tx="http://www.springframework.org/schema/tx" xmlns:sec="http://www.springframework.org/schema/security"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
           http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd
           http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.0.xsd
           http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.0.xsd
           http://www.springframework.org/schema/security http://www.springframework.org/schema/security/spring-security-2.0.xsd">

    <context:annotation-config />
    
    <bean class="org.opentripplanner.analyst.request.SampleFactory" />
    <bean class="org.opentripplanner.routing.impl.DefaultRemainingWeightHeuristicFactoryImpl"/>
    <bean class="org.opentripplanner.routing.algorithm.GenericAStar"/>
    <bean class="org.opentripplanner.analyst.batch.IndividualFactory" />
    <bean class="org.opentripplanner.analyst.core.GeometryIndex" />
        
    <!-- specify a GraphService, configuring the path to the serialized Graphs -->
    <bean id="graphService" class="org.opentripplanner.routing.impl.GraphServiceImpl">
        <property name="path" value="/var/otp/graphs/{}/" />
        <property name="defaultRouterId" value="arbor" />
    </bean>


<!-- this creates a population directly from of a list of individuals. in this case there is only one. -->
	<bean id="origins" class="org.opentripplanner.analyst.batch.BasicPopulation">
        <property name="individuals">
            <list>
			    <bean class="org.opentripplanner.analyst.batch.Individual">
				    <property name="label" value="UMich" />
			        <property name="lon" value="-83.73820" />
			        <property name="lat" value="42.27490" />
			    </bean>
            </list>
        </property>
    </bean>

<!-- this loads a population from a comma-separated flat text file -->
	<bean id="origins" class="org.opentripplanner.analyst.batch.CSVPopulation">
        <property name="sourceFilename" value="/home/abyrd/access/annarbor.csv" />
        <property name="latCol" value="1" />
        <property name="lonCol" value="2" />
        <property name="labelCol" value="0" />
        <property name="inputCol" value="3" />        
	</bean>

<!-- define the main batch processor, which will build one shortest path tree from each origin to all destinations -->
	
	<bean id="batchProcessor" class="org.opentripplanner.analyst.batch.BatchProcessor"> 
        <property name="outputPath" value="/home/abyrd/access/out1834_clamp.tiff" />
        <property name="routerId" value="arbor" />
        <property name="date" value="2012-07-12" />
        <property name="time" value="08:00 AM" />
        <property name="timeZone" value="America/New_York" />
        <property name="prototypeRoutingRequest">
			<bean class="org.opentripplanner.routing.core.PrototypeRoutingRequest">
				<!-- Set default routing parameters here -->
		        <property name="maxWalkDistance" value="400000" />
		        <property name="clampInitialWait" value="1800" />
		        <property name="arriveBy" value="false" />
			</bean>
		</property>		
	</bean>

</beans>

Aggregator Example

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:context="http://www.springframework.org/schema/context" xmlns:aop="http://www.springframework.org/schema/aop"
    xmlns:tx="http://www.springframework.org/schema/tx" xmlns:sec="http://www.springframework.org/schema/security"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
           http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd
           http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.0.xsd
           http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.0.xsd
           http://www.springframework.org/schema/security http://www.springframework.org/schema/security/spring-security-2.0.xsd">

    <context:annotation-config />
    
    <bean class="org.opentripplanner.analyst.request.SampleFactory" />
    <bean class="org.opentripplanner.routing.impl.DefaultRemainingWeightHeuristicFactoryImpl"/>
    <bean class="org.opentripplanner.routing.algorithm.GenericAStar"/>
    <bean class="org.opentripplanner.analyst.batch.IndividualFactory" />
    <bean class="org.opentripplanner.analyst.core.GeometryIndex" />
    
    
    <!-- specify a GraphService, configuring the path to the serialized Graphs -->
    <bean id="graphService" class="org.opentripplanner.routing.impl.GraphServiceImpl">
        <property name="path" value="/var/otp/graphs/{}/" />
        <property name="defaultRouterId" value="arbor" />
    </bean>

<!-- this creates a population arranged on a regular grid that can later be saved as an image -->
    <bean id="origins" class="org.opentripplanner.analyst.batch.SyntheticRasterPopulation">
        <property name="left" value="-84.14" />
        <property name="right" value="-83.41" />
        <property name="bottom" value="42.07" />
        <property name="top" value="42.45" />        
        <property name="crsCode" value="epsg:4326" />
        <property name="cols" value="100" />
        <property name="rows" value="100" />        
    </bean>
 
<!-- this loads a population from a comma-separated flat text file -->

    <bean id="destinations" class="org.opentripplanner.analyst.batch.CSVPopulation">
        <property name="sourceFilename" value="/home/abyrd/access/annarbor.csv" />
        <property name="latCol" value="1" />
        <property name="lonCol" value="2" />
        <property name="labelCol" value="0" />
        <property name="inputCol" value="3" />        
    </bean>
	
<!-- define the main batch processor, which will build one shortest path tree from each origin to all destinations -->
	
	<bean id="batchProcessor" class="org.opentripplanner.analyst.batch.BatchProcessor"> 
        <property name="outputPath" value="/home/abyrd/access/out1834_clamp.tiff" />
        <property name="routerId" value="arbor" />
        <property name="date" value="2012-07-12" />
        <property name="time" value="08:00 AM" />
        <property name="timeZone" value="America/New_York" />
        <property name="prototypeRoutingRequest">
           <bean class="org.opentripplanner.routing.core.PrototypeRoutingRequest">
                        <!-- Set default routing parameters here -->
		        <property name="maxWalkDistance" value="2000" />
		        <property name="clampInitialWait" value="1800" />
		        <property name="arriveBy" value="true" />
            </bean>
        </property>		
        <property name="aggregator"> 
            <bean class="org.opentripplanner.analyst.batch.aggregator.ThresholdSumAggregator">
                <property name="threshold" value="3600" />
            </bean>
        </property> 
    </bean>
</beans>

Accumulator Example

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:context="http://www.springframework.org/schema/context" xmlns:aop="http://www.springframework.org/schema/aop"
    xmlns:tx="http://www.springframework.org/schema/tx" xmlns:sec="http://www.springframework.org/schema/security"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
           http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd
           http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.0.xsd
           http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.0.xsd
           http://www.springframework.org/schema/security http://www.springframework.org/schema/security/spring-security-2.0.xsd">

    <context:annotation-config />
    
    <bean class="org.opentripplanner.analyst.request.SampleFactory" />
    <bean class="org.opentripplanner.routing.impl.DefaultRemainingWeightHeuristicFactoryImpl"/>
    <bean class="org.opentripplanner.routing.algorithm.GenericAStar"/>
    <bean class="org.opentripplanner.analyst.batch.IndividualFactory" />
    <bean class="org.opentripplanner.analyst.core.GeometryIndex" />
    
    
    <!-- specify a GraphService, configuring the path to the serialized Graphs -->
    <bean id="graphService" class="org.opentripplanner.routing.impl.GraphServiceImpl">
        <property name="path" value="/var/otp/graphs/{}/" />
        <property name="defaultRouterId" value="arbor" />
    </bean>

<!-- this loads a population from a comma-separated flat text file -->
    <bean id="origins" class="org.opentripplanner.analyst.batch.CSVPopulation">
        <property name="sourceFilename" value="/home/abyrd/access/annarbor.csv" />
        <property name="latCol" value="1" />
        <property name="lonCol" value="2" />
        <property name="labelCol" value="0" />
        <property name="inputCol" value="3" />        
    </bean>

<!-- this creates a population arranged on a regular grid that can later be saved as an image -->
    <bean id="destinations" class="org.opentripplanner.analyst.batch.SyntheticRasterPopulation">
        <property name="left" value="-84.14" />
        <property name="right" value="-83.41" />
        <property name="bottom" value="42.07" />
        <property name="top" value="42.45" />        
        <property name="crsCode" value="epsg:4326" />
        <property name="cols" value="1280" />
        <property name="rows" value="1024" />        
    </bean>
 
<!-- define the main batch processor, which will build one shortest path tree from each origin to all destinations -->
	<bean id="batchProcessor" class="org.opentripplanner.analyst.batch.BatchProcessor"> 
        <property name="outputPath" value="/home/abyrd/access/out1834_clamp.tiff" />
        <property name="routerId" value="arbor" />
        <property name="date" value="2012-07-12" />
        <property name="time" value="08:00 AM" />
        <property name="timeZone" value="America/New_York" />
        <property name="prototypeRoutingRequest">
			<bean class="org.opentripplanner.routing.core.PrototypeRoutingRequest">
				<!-- Set default routing parameters here -->
		        <property name="maxWalkDistance" value="400000" />
		        <property name="clampInitialWait" value="1800" />
		        <property name="arriveBy" value="false" />
			</bean>
		</property>		
		<!-- 
        <property name="aggregator"> 
            <bean class="org.opentripplanner.analyst.batch.aggregator.ThresholdSumAggregator">
                <property name="threshold" value="3600" />
            </bean>
        </property> 
        -->
        <property name="accumulator"> 
            <bean class="org.opentripplanner.analyst.batch.ThresholdAccumulator">
                <property name="threshold" value="3600" />
            </bean>
        </property> 
	</bean>

</beans>

The number of 18-35 year old residents in the Ann Arbor, Michigan area that can reach each raster cell in less than 1 hour of transit + walking (blue=5000, yellow=100000, green=200000).

Data Filter Chains

Clone this wiki locally