-
Notifications
You must be signed in to change notification settings - Fork 99
Phase 3 Scope of Work
#DRAFT 01/23/17
Phase 3 of ActivitySim development is focused on improving the data pipelining procedures and implementing remaining sub-models in order to more easily add additional contributors to the effort. Once the core features of the system are in place, with the exception of multi-processing, the plan is for a couple of the AMPO partner staff to assist in implementing the first release in 2018.
Task 1: Project Management
Task 2: Data Pipelining
Task 3: Informed Sampling Procedure
Task 4: Shadow Pricing
Task 5: Logsums in Utility Functions
Task 6: At-Work Subtour Models
Task 7: Stop-Level Models
Task 8: Joint Tour models
Task 9: Fix Random Number Sequences
Task 10: Completing Phase 1 Models
The purpose of this first task is to manage the overall project, including invoicing and conference calls with the project team, and coordination with the AMPO agency partners. All deliverables, including meeting notes, software, tests, documentation, and issue tracking will be managed through GitHub. UrbanLabs will twice review project progress and QA/QC select project deliverables, as identified by the AMPO partners.
Deliverable(s): (Due 52 weeks from NTP)
- Management of Bi-Weekly Meetings
- Pre- and Post-Meeting Notes
- Invoicing and Progress Reports
- Client Coordination
- QA/QC of Select Deliverables
Comments
- xxxx
###Task 2: Data Pipelining The goal of this task is to better manage the movement and transformation of data within ActivitySim through the development of a consistent, comprehensive, and efficient approach to data pipelining. Currently, ActivitySim uses Orca to define sub-model inputs and to setup and run the sub-models. The outputs of sub-models, which are often the inputs to later sub-models, are not explicitly defined and are simply stored in-memory and available if needed. Nothing, including fundamental outputs such as trip matrices, is currently written to disk.
The purpose of this task to develop software to make the user interface for data inputs and outputs easier and to improve the orchestration of model setup and running. This will include the development of ActivitySim specific code for reading and writing data, getting and setting inputs and outputs of model steps, and the ability to (re)start and stop model runs midstream for debugging and testing. We will begin this task by evaluating alternatives to Orca, such as Luigi or Airflow, since restarting within a model run is likely required and Orca wasn't built with this in mind. We will then prototype data pipelining for the first couple of sub-models and share our findings with the AMPO partners. Upon selection of an approach, we will revise the example to use the new data pipelining framework, and the updated software will be described in the online documentation.
A note about multi-threading/processing. ActivitySim is currently single threaded and
iterates through chunks of pandas table records when solving a model. It is planned
to revise ActivitySim to concurrently operate on the chunks via multi-processing.
This means creating cross-process-safe shared data structures, dispatching chunks of
choosers to different Python processes, accumulating the results in a process-safe
manner, and waiting for all the processes to complete before moving on. We will keep
this approach to multi-threading/processing in mind when building the data pipelining
procedures.
Deliverable(s): (Due 8 weeks from NTP)
- Prototype Data Pipelining Procedures
- Data Pipelining Procedures
- Updated Example
- Updated Documentation and Tests
Comments
-
We can continue to use orca's computed column capability even if we don't use orca as our overall data pipelining technology. It is fairly powerful and concise - but also potentially baffling to newcomers unversed in functional programming and dependency injection. We should compare that approach with a couple of alternatives to assess the tradeoffs between elegance and comprehensibility. (jwd)
-
(Daniels, Clint) This task implements a more well-documented transition flow between sub-models. In coordination with that, I would like to see a more detailed data dictionary and formatting considered here. I think that is what is meant by updating documentation, but I would like for it to be more clear.
###Task 3: Informed Sampling Procedure The goal of this task is to revise the location choice sampling procedures so they intelligently sample destinations instead of simply randomly selecting a sample. The procedure will specify the sampling expressions in pandas expression files, similar to other sub-models, and write the results into the local data store for use in downstream models. Transit subzones will not be implemented. The new procedures will be described in the online documentation.
Deliverable(s): (Due 12 weeks from NTP)
- Destination Choice Sampling Procedures
- Updated Example
- Updated Documentation and Tests
Comments
- xxx
###Task 4: Shadow Pricing The goal of this task is to develop a shadow pricing module for the mandatory destination choice models. Shadow pricing is an iterative mechanism that works to match estimated workers and students in each zone to input employment and enrollment totals. The calculated shadow prices are part of the destination choice model utilities and also output to a file so they can seed subsequent model runs in order to reduce run time. Shadow pricing in MTC travel model one is done by destination choice size variable segment (work low income, work medium income, work high income, work very high income, university, school grade, school high), zone, and walk transit sub-zone market (no walk %, short walk %, long walk %).
The ActivitySim Shadow Pricing module will operate on the user defined set of destination choice size terms segments by zone. It will not include walk transit sub-zone markets since these are currently not in ActivitySim. The procedure will be implemented before multi-threading/processing is added to ActivitySim since shadow pricing requires summing data across threads/processes (assuming threads/processes process batches of households for example). The module will include the ability to save results for input to subsequent model runs. We will revise the existing ActivitySim mandatory destination choice sub-models to use the new shadow pricing code. The new procedures will be described in the online documentation.
Deliverable(s): (Due 16 weeks from NTP)
- Shadow Pricing Calculation Procedure and Interface
- Updated Mandatory Destination Choice Sub-Models to Use Shadow Pricing Module
- Updated Documentation and Tests
Comments
- JEF: It always sort of bugged me that the shadow prices in CT-RAMP 1 were segmented by income and walk market. I am not sure why the code was designed this way but it seems like overkill, and could result in some very different prices for each segment, which isn't intuitively obvious. So I would suggest that you start simply, by calculating prices across zones. If there are convergence problems one can always add the segmentation later.
The goal of this task is to develop the functionality to calculate discrete choice model logsums and to then make them accessible to upstream model calculations. The most common application of this is calculating mode choice model logsums (i.e. multimodal accessibility) for each potential destination in a destination choice model. Since ActivitySim is solving each sub-model for all choosers at once, it also needs to solve logsums for all tours/trips/etc. at once, and then store the results in-memory and/or the data store for later use in pandas expressions in other sub-models.
We will begin this task by implementing the MTC travel model one tour mode choice model and logsum calculation engine. We will likely simplify the mode choice model by including only a handful of utility expressions in order to focus on the functionality and its correctness, rather than on the actual model design. The simplified implementation will include at least one variable from each type of data object in order to ensure all the required data connections/interfaces are implemented. We will revise the existing ActivitySim sub-models to use the new logsums interface. The new procedures will be described in the online documentation.
Deliverable(s): (Due 24 weeks from NTP)
- Simplified Mode Choice Logsums Calculation Procedure and Interface
- Updated Sub-Models to Use Logsums Interface
- Updated Documentation and Tests
Comments
-
JEF: Should be ok with this task so long as you are sampling destinations, since it will be prohibitively expensive to calculate a logsum for each tour and person. An alternative, iterative approach would be to pre-calculate logsums for segments, and cache them, rather than calculate them on-the-fly. The on-the-fly logsums could then be added later. This would not work in the MAZ world though. And though you are testing this with the tour mode choice model, it should be designed in such a way that we can obtain logsums from any choice model.
-
(Daniels, Clint) I am little worried this task will lead to memory bloat. Memory usage is one of the biggest problems in the current implementations of CT-RAMP. In coordination with the informed sampling procedures above, I'd like to see if there are ways we can get smarter about what needs to be available all the time and what can be pulled in and out without causing huge runtime performance problems.
###Task 6: At-Work Subtour Models The goal of this task is to implement the at-work subtour frequency, scheduling, and destination sub-models. The at-work subtour frequency, scheduling, and destination sub-models are similar in form to the existing (partially) implemented non-mandatory tour frequency, departure and duration, and destination models. However, a few key missing features of the existing models are the processing of each tour purpose, the calculation of time windowing variables, and logsums. These missing expressions and underlying software will be as faithfully implemented as possible within the available budget (in in addition to what is done in other tasks). The new procedures will be described in the online documentation.
Deliverable(s): (Due 28 weeks from NTP)
- At-Work Subtour Models
- Updated Example, including Expression and Config Files
- Updated Documentation and Tests
Comments
- xxxx
###Task 7: Stop-Level Models The goal of this task is to implement the stop frequency, purpose, scheduling, destination sampling, and destination sub-models. The stop frequency model is similar to the tour frequency model but also requires information about the traveler's skeleton schedule (such as number of tours scheduled) and tour attributes such as purpose and start time. The MTC travel model one stop purpose model simply samples from a probability distribution given tour purpose, direction, departure time, and person type. The MTC travel model one stop scheduling model also samples from a probability distribution given tour purpose, direction, departure time, and trip number.
The stop destination sampling model is similar to the tour destination sampling model but sequentially processes the stops from tour origin to destination when outbound and then destination to origin when inbound. When doing so, the model calculates out-of-direction network costs in two-parts: origin to alternative stop + alternative stop to next destination. We expect to vectorize these sequential models by segmenting the model by number of stops on the tour leg in order to apply the same calculation to record. This is essentially the same vectorization plan as was done for the CDAP re-write. The new procedures will be described in the online documentation.
Deliverable(s): (Due 36 weeks from NTP)
- Stop-level Models
- Updated Example, including Expression and Config Files
- Updated Documentation and Tests
Comments
- xxxx
###Task 8: Joint Tour Models The goal of this task is to implement the joint tour frequency, party composition, person participation, scheduling, and destination sub-models. The joint tour frequency and party composition models are relatively straightforward and should be easily vectorized. The joint tour person participation model currently loops through household persons until a valid tour party is formed. Within the loop, a switch statement based on the party composition sets up and solves the relevant expression file for a) adults only, b) children only, or c) mixed, adults and children. After solving the relevant model, it checks for a valid party and if not, then it repeats the participation choice with new random numbers.
Vectorizing the joint tour person participation model will require re-structuring the problem into a series of pandas tables with either rows or columns being persons or joint tours probably by party composition. It will also require the calculation of available person time windows, which is required for the partially implemented duration models as well. These calculations will likely be added to the household and person tables as methods to calculate person time availability. The method will accept a vector of households or persons and return a vector of time window availability by operating a vectorized representation of person time use throughout the simulation day.
The joint tour scheduling model is similar to the existing other partially implemented scheduling models. It will be implemented under this task and depends on some of the key missing features noted earlier - calculation of time windows and logsums. These missing expressions and underlying software will be as faithfully implemented as possible within the available budget (in in addition to what is done in other tasks).
The joint tour destination models are similar to the other partially implemented destination models. The new procedures will be described in the online documentation.
Deliverable(s): (Due 44 weeks from NTP)
- Joint Tour Models
- Updated Example, including Expression and Config Files
- Updated Documentation and Tests
Comments
- xxxx
###Task 9: Fix Random Number Sequences The goal of this task is to add the ability to fix random number sequences during the model run in order to replicate results under various setups. Some example setups include: a) restarting the model run in the middle of the run and getting the same results as before, b) running the same household under different sampling plans and getting the same results (assuming there is no interaction between households, i.e. shadow pricing), and c) helping to ensure stable results across alternative network scenarios so that differences in results are due primarily to changes in inputs and not random number sequencing. MTC travel model one has a separate random number generator object for each household and then generates random numbers in sequence from the start of the run. The model also has some additional functionality to keep track of numbers drawn before and after each sub-model in order to be restarted, but the functions were often not called when required.
For ActivitySim, we may implement an improved version with one random number generator object and each household by person by sub-model having different random seed offsets. The random number offsets could be pandas attributes and be stored in the local data store so they will be available for downstream sub-models. Requesting at once a vector of random numbers given a vector of offsets for each household, person, tour, etc. is a requirement. The random number sequencing procedures and attributes will be described in the online documentation.
Deliverable(s): (Due 50 weeks from NTP)
- Improved Random Number Management
- Updated Sub-Models
- Updated Documentation and Tests
Comments
- I would expect this only to be an issue with multithreading/multiprocessing or when restarting a pipeline midway? The latter case has some easy trivial solutions. (jwd)
###Task 10: Completing Phase 1 Models The goal of this task is to continue to verify, and correct as needed, sub-models implemented in phase 1. Consultant will work through the sub-models in order and fix as many unresolved issues, such as implementing missing utility variables, logsums information, time windows, additional time periods, etc., as budget allows. Consultant may also re-factor/organize the file/folder setup in order to improve the separation of concerns. Consultant will update the source code, documentation, and tests as a result of any revisions to the framework.
Deliverable(s): (Due 52 weeks from NTP)
- Full Model Run with All Zones, Skims, and Households of the Sub-Models Implemented
- Comparisons of Model Results to Expected Results
- Updated Source Code, Configuration Files, Documentation, and Tests
Comments
- (Daniels, Clint) Does this get us a fully functional replica of Travel Model One?