Skip to content

Utility Calculation using Cheval

Peter Kucirek edited this page Sep 25, 2017 · 4 revisions

This page is still a work in progress.

This page details how to use Cheval to compute utilities in a discrete choice model. It is divided into three sections: the first details the rules of syntax for utility expressions, the second covers how the pass utility expressions into ChoiceModels, and the third details how to use ChoiceModel.scope to prepare data for utility computation. When building a model program, it is generally assumed that the utility expressions are read-in from a file, while setting up the scope is a fixed part of the program code.

For the examples in this page, a simple nesting structure is used, with choices labelled A, B, C, and D. Internally, Cheval represents this structure as a DataFrame storing the utility for each alternative for each record. This utility table is visualised below:

Expression Syntax

Cheval uses Python's built-in ast module to preprocess expressions before passing them to the NumExpr engine. This means that:

  1. Cheval expressions use Python syntax for variables and operands, etc.; and
  2. Cheval expressions support all NumExpr functions, such as where, sum, log, etc.

A very simple expression (to the point of being nearly useless) is

log(5.8) * 2.7 + sqrt(12.9) * 0.9

Of course, in the context of a discrete choice model, this would result in each choice having the same utility, and since a Logit model depends on differences in utilities, this expression has no effect. What would be far more helpful is a different value for each choice. Fortunately, this is such a common occurrence in utility expressions that Cheval defines a special syntax to make things simpler. Let's replace the 2.7 with an array of values, one for each of our alternatives:

log(2.5) * {A: 5.8, B: 0, C: 1.2, D: -0.9} + sqrt(12.9) * 0.9

This dictionary comprehension stands in as a shortcut for an array of alternative-based coefficients. For convenience, if the coefficient for a particular alternative is 0 (like B is, above), it can be omitted from the comprehension and Cheval will assume that it's 0:

log(2.5) * {A: 5.8, C: 1.2, D: -0.9} + sqrt(12.9) * 0.9

If an alternative ID is a number or contains spaces, you should surround the key with quotes: {"choice 1": 0.9, "2": 0.43}.

So far, so good. But what if the utility needs to change for each attribute? For example, if you wanted your utility expression to apply a set of coefficients to the log of a person's age? In this case, Cheval allows you to reference a symbol during evaluation:

log(person.age) * {A: 5.8, B: 0, C: 1.2, D: -0.9} + sqrt(12.9) * 0.9

In this example, person is recognised by Cheval as a symbol with an age attribute. Later, it will be up to the programmer to associate this symbol with data, but for now it enough to know that it's been recognised. Simpler symbol substitutions are also allowed. For example, if you have a complex relationship between each person and each alternative (for example, distance to zone in a location choice model) you would need to pass this "distance" table into the utility expression like so:

log(person.age) * {A: 5.8, B: 0, C: 1.2, D: -0.9} + sqrt(distance) * 0.9

Finally, it is possible to use LinkedDataFrames to compute even more complex variables based on linkages. For example, let's add a dummy variable to Choice A, for persons living in households with at least 3 persons. If person is a LinkedDataFrame, linked to household which is itself linked back to persons (circular relationships are permitted and quite useful), then this could be written as:

where(person.household.persons.count() >= 3, {A: -0.75}, 0)

This is a concise way of representing this variable in the model specification.

Preparing the model Scope for evaluation

The scope property is used to associate variables in the utility expressions with numerical data. The set of variables is collected during expression parsing; it is not possible to evaluate utility expressions until all variables have been "filled" with data. These variables are referred to as "symbols" for the rest of this document.

Before utilities can be evaluated, the record index must be set. This pandas.Index object refers to the records being processed. For example, if the model is being run for a subset of a synthetic population, then the record index could be the person ID. If the model is being run for an OD matrix of trips, then the record index is a 2-level MultiIndex in the format [Origin, Destination]. cheval.scope uses the record index to validate inputs to the utility expressions as they are filled.

The record index is only needed to be set before filling symbols; it is entirely fine to pre-parse expressions prior to setting the record index.

The Scope API

Under construction