Skip to content
abyrd edited this page Jul 27, 2012 · 12 revisions

The OpenTripPlanner Analyst web services produce map tiles or rasters derived from a single shortest path tree, which is to say that they reveal information (often travel time) about the geographic area covered by a graph from the perspective of a single (but freely movable) origin point.

The OpenTripPlanner Analyst batch framework covers a larger range of use cases, which may involve accumulating or aggregating results derived from many independently built single-source shortest path trees. Rather than a web service that serves map tiles, the batch framework includes a command-line program that is configured via Spring dependency injection XML. The locations and other attributes of path search endpoints (origins and destinations) may be loaded from Shapefile, CSV, or raster formats. Results may be saved back to a raster or a CSV file for further manipulation or analysis in a desktop GIS package like QGIS or a statistics package like R.

Batch Analyst is not a web servlet, but a plain Java main class: org.opentripplanner.analyst.batch.BatchProcessor. The simplest way to work with it is to configure your IDE to run it (e.g. using the "run configurations" dialog in Eclipse). It would be equally possible to invoke it from the command line but you'd have to include all the Maven dependencies on the classpath. As with the OTP web services, be sure to give the Java virtual machine plenty of heap space, especially when using large graphs. Maximum heap size is controlled with the -Xmx parameter, and typically needs to be on the order of a few gigabytes for smooth operation with medium-sized graphs.

The BatchProcessor will read load its Spring application context from the XML configuration in src/main/resources/batch-context.xml, or alternatively a configuration file specified on the command line. The XML in this file declaratively describes the configuration and instantiation of Spring "beans", which will both provide the processing logic and represent the data sets you are working with. The processing logic components will generally remain the same from one execution to another, but you will need to edit and/or rearrange the beans that furnish the origin set, destination set, and aggregate function to fit your specific use case.

The origin and destination set objects must implement the Population interface, which is to say that they must be iterable collections of Individuals that retain some information about their structure (grid or scattered), source format, and CRS. Each Individual has a location (latitude and longitude) and a user-defined input value, all of which are double-precision floating point values. Population implementations are provided that load individuals from a flat comma-separated file, an ESRI Shapefile, or a georeferenced raster (image) file. It is also possible to manually build up a Population element by element via Spring properties, or to generate a Population of Individuals on a regular grid by specifying the width, height, CRS, and envelope.

Batch Analyst execution always follows the same general pattern: the program iterates over the source population, building a shortest path tree for each Individual. At each iteration, it finds states in the shortest path tree near each destination Individual and records some information such as travel time or number of transfers in a result set. This result set has the same size and structure as the destination Population, and can therefore be saved in a format similar to the one the destinations were loaded from.

Additional processing may occur depending on whether an Aggregator or Accumulator has been configured. There are three possibilities:

  1. No Aggregator or Accumulator was specified. In this case, one set of results is produced for each Individual in the origin set (i.e. one per shortest path tree). The number of output files will be equal to the number of origin Individuals; the format and dimensions of each output file will correspond to the destination set. Each line, feature, or pixel in the output files will correspond to an Individual in the destination Population, and will have a value derived from the dominant States near that Individual, as well as the input attribute loaded into that Individual.

  2. An Aggregator was specified. For each element of the origin Population, the Aggregator examines the result set and reduces it to a single double-precision floating point value, generally by combining it with the corresponding destination Individuals' input attributes. This aggregate value is stored in a result set whose characteristics and size correspond to the origin set, and is associated with the Individual whose shortest path tree it summarizes. The aggregate result set persists throughout the run, and is progressively filled in by the series of SSSP searches and aggregate function evaluations. When one aggregate value has been calculated for each origin Individual, the aggregate result set is saved in an appropriate format. Thus, there is a single output file which corresponds in type and dimension to the origin Population. For example, if the origins were loaded from a raster or arranged on a grid, the output would be a single GeoTIFF image file with a single float band. Each pixel in this raster would contain the aggregate value for the shortest path tree built from that location, evaluated with respect to the (unchanging) destination Population.

  3. An Accumulator was specified. For each element of the origin Population, the Accumulator examines the result set and may modify a single result set that persists over all iterations. The accumulated result set has the same size and structure as each iteration's result set, which in turn corresponds to the destination Population.

The associations between destination individuals and vertices in the graph are cached to speed up the process.

This is in effect finding a shortest path for every pair in the Cartesian product of the origin and destination sets and recording some information about each of those paths, but exploiting the fact that a single shortest path tree can be reused for all paths with the same origin. This greatly reduces run time relative to a naïve iteration over all O/D pairs, since we need only build |O| instead of |O||D| shortest path trees.

Properties of the BatchProcessor itself allow the user to customize the search process, and a PrototypeRoutingRequest may be provided to set OTP routing parameters such as mode of transport or maximum walk distance (which would be specified in the OTP query string when using the REST routing service).

To cover several kinds of batch requests, there are two modes: agg and non-agg. The batch processor chooses a mode based on whether the aggregator property has been set or not.

In either mode, both the source and target population properties must be set. The batch analysis is always carried out as a loop over the source set. In aggregate mode, the supplied aggregate function is evaluated over the target set for every element of the source set. The resulting aggregate value is associated with the origin individual that produced it, and the entire set of aggregates are saved together in a format appropriate for that population type. Thus, aggregate mode produces a single output object/stream/buffer, containing one unit of output (tuple/line/pixel) per individual in the source set. In non-aggregate mode, one output object/stream/buffer is produced per source location. Thus, for S sources and D destinations, S output objects will be produced, each containing D data items.

Here is an example batch-context.xml:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:context="http://www.springframework.org/schema/context" xmlns:aop="http://www.springframework.org/schema/aop"
    xmlns:tx="http://www.springframework.org/schema/tx" xmlns:sec="http://www.springframework.org/schema/security"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
           http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd
           http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.0.xsd
           http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.0.xsd
           http://www.springframework.org/schema/security http://www.springframework.org/schema/security/spring-security-2.0.xsd">

    <context:annotation-config />
    
    <bean class="org.opentripplanner.analyst.request.SampleFactory" />
    <bean class="org.opentripplanner.routing.impl.DefaultRemainingWeightHeuristicFactoryImpl"/>
    <bean class="org.opentripplanner.routing.algorithm.GenericAStar"/>
    <bean class="org.opentripplanner.analyst.batch.IndividualFactory" />
    <bean class="org.opentripplanner.analyst.core.GeometryIndex" />
    
    
    <!-- specify a GraphService, configuring the path to the serialized Graphs -->
    <bean id="graphService" class="org.opentripplanner.routing.impl.GraphServiceImpl">
        <property name="path" value="/var/otp/graphs/{}/" />
        <property name="defaultRouterId" value="arbor" />
    </bean>


<!-- this creates a population directly from of a list of individuals -->
<!--     
	<bean id="origins" class="org.opentripplanner.analyst.batch.BasicPopulation">
        <property name="individuals">
            <list>
			    <bean class="org.opentripplanner.analyst.batch.Individual">
				    <property name="label" value="UMich" />
			        <property name="lon" value="-83.73820" />
			        <property name="lat" value="42.27490" />
			    </bean>
            </list>
        </property>
    </bean>
-->

<!-- this creates a population arranged on a regular grid that can later be saved as an image -->
	<bean id="destinations" class="org.opentripplanner.analyst.batch.SyntheticRasterPopulation">
        <property name="left" value="-84.14" />
        <property name="right" value="-83.41" />
        <property name="bottom" value="42.07" />
        <property name="top" value="42.45" />        
        <property name="crsCode" value="epsg:4326" />
        <property name="cols" value="1280" />
        <property name="rows" value="1024" />        
	</bean>
 
<!-- this loads a population from a comma-separated flat text file -->

	<bean id="origins" class="org.opentripplanner.analyst.batch.CSVPopulation">
        <property name="sourceFilename" value="/home/abyrd/access/annarbor.csv" />
        <property name="latCol" value="1" />
        <property name="lonCol" value="2" />
        <property name="labelCol" value="0" />
        <property name="inputCol" value="3" />        
	</bean>

<!-- aggregate results are no longer stored in individuals so populations can be reused -->
<!-- <alias name="destinations" alias="origins"/> -->
	
<!-- define the main batch processor, which will build one shortest path tree from each origin to all destinations -->
	
	<bean id="batchProcessor" class="org.opentripplanner.analyst.batch.BatchProcessor"> 
        <property name="outputPath" value="/home/abyrd/access/out1834_clamp.tiff" />
        <property name="routerId" value="arbor" />
        <property name="date" value="2012-07-12" />
        <property name="time" value="08:00 AM" />
        <property name="timeZone" value="America/New_York" />
        <property name="prototypeRoutingRequest">
			<bean class="org.opentripplanner.routing.core.PrototypeRoutingRequest">
				<!-- Set default routing parameters here -->
		        <property name="maxWalkDistance" value="400000" />
		        <property name="clampInitialWait" value="1800" />
		        <property name="arriveBy" value="false" />
			</bean>
		</property>		
		<!-- 
        <property name="aggregator"> 
            <bean class="org.opentripplanner.analyst.batch.aggregator.ThresholdSumAggregator">
                <property name="threshold" value="3600" />
            </bean>
        </property> 
        -->
        <property name="accumulator"> 
            <bean class="org.opentripplanner.analyst.batch.ThresholdAccumulator">
                <property name="threshold" value="3600" />
            </bean>
        </property> 
	</bean>

</beans>

The number of 18-35 year old residents in the Ann Arbor, Michigan area that can reach each raster cell in less than 1 hour of transit + walking (blue=5000, yellow=100000, green=200000).

Clone this wiki locally