-
Notifications
You must be signed in to change notification settings - Fork 3
Utility Calculation using Cheval
This page is still a work in progress.
This page details how to use Cheval to compute utilities in a discrete choice model. It is divided into three sections: the first details the rules of syntax for utility expressions, the second covers how the pass utility expressions into ChoiceModel
s, and the third details how to use ChoiceModel.scope
to prepare data for utility computation. When building a model program, it is generally assumed that the utility expressions are read-in from a file, while setting up the scope is a fixed part of the program code.
For the examples in this page, a simple nesting structure is used, with choices labelled A, B, C, and D. Internally, Cheval represents this structure as a DataFrame storing the utility for each alternative for each record. This utility table is visualised below:
Cheval uses Python's built-in ast
module to preprocess expressions before passing them to the NumExpr engine. This means that:
- Cheval expressions use Python syntax for variables and operands, etc.; and
- Cheval expressions support all NumExpr functions, such as
where
,sum
,log
, etc.
A very simple expression (to the point of being nearly useless) is
log(5.8) * 2.7 + sqrt(12.9) * 0.9
Of course, in the context of a discrete choice model, this would result in each choice having the same utility, and since a Logit model depends on differences in utilities, this expression has no effect. What would be far more helpful is a different value for each choice. Fortunately, this is such a common occurrence in utility expressions that Cheval defines a special syntax to make things simpler. Let's replace the 2.7
with an array of values, one for each of our alternatives:
log(2.5) * {A: 5.8, B: 0, C: 1.2, D: -0.9} + sqrt(12.9) * 0.9
This dictionary comprehension stands in as a shortcut for an array of alternative-based coefficients. For convenience, if the coefficient for a particular alternative is 0 (like B is, above), it can be omitted from the comprehension and Cheval will assume that it's 0:
log(2.5) * {A: 5.8, C: 1.2, D: -0.9} + sqrt(12.9) * 0.9
If an alternative ID is a number or contains spaces, you should surround the key with quotes:
{"choice 1": 0.9, "2": 0.43}
.
So far, so good. But what if the utility needs to change for each attribute? For example, if you wanted your utility expression to apply a set of coefficients to the log of a person's age? In this case, Cheval allows you to reference a symbol during evaluation:
log(person.age) * {A: 5.8, B: 0, C: 1.2, D: -0.9} + sqrt(12.9) * 0.9
In this example, person
is recognised by Cheval as a symbol with an age
attribute. Later, it will be up to the programmer to associate this symbol with data, but for now it enough to know that it's been recognised. Simpler symbol substitutions are also allowed. For example, if you have a complex relationship between each person and each alternative (for example, distance to zone in a location choice model) you would need to pass this "distance" table into the utility expression like so:
log(person.age) * {A: 5.8, B: 0, C: 1.2, D: -0.9} + sqrt(distance) * 0.9
Finally, it is possible to use LinkedDataFrame
s to compute even more complex variables based on linkages. For example, let's add a dummy variable to Choice A, for persons living in households with at least 3 persons. If person
is a LinkedDataFrame, linked to household
which is itself linked back to persons
(circular relationships are permitted and quite useful), then this could be written as:
where(person.household.persons.count() >= 3, {A: -0.75}, 0)
This is a concise way of representing this variable in the model specification.
The scope
property is used to associate variables in the utility expressions with numerical data. The set of variables is collected during expression parsing; it is not possible to evaluate utility expressions until all variables have been "filled" with data. These variables are referred to as "symbols" for the rest of this document.
Before utilities can be evaluated, the record index must be set. This pandas.Index
object refers to the records being processed. For example, if the model is being run for a subset of a synthetic population, then the record index could be the person ID. If the model is being run for an OD matrix of trips, then the record index is a 2-level MultiIndex
in the format [Origin, Destination]. cheval.scope
uses the record index to validate inputs to the utility expressions as they are filled.
The record index is only needed to be set before filling symbols; it is entirely fine to pre-parse expressions prior to setting the record index.
Under construction