Project Meeting 2020.10.13

Technical Call

Update on TVPB
- Now have a running Marin county example work tour mode choice model up and running
  - Have run 170k work tours successfully
  - Needed to clean-up the draft expression files I created, add some more tracing to the TVPB, and make some chunking improvements
  - Adaptive chunking appears to be working well
- Next steps
  - Pre-computing/caching since there are lots of redundant calculations since the utilities are not specific to individual people
  - There's a lot of redundancy in the new TVPB expression files too so we may be able to make some efficiencies there as well
  - There's an interesting problem with python evaluating Boolean expressions in numexpr versus regular python if on the left side of the equation
  - Python throws a warning and we should catch and notify the user since this increases runtime
  - Plan to discuss pre-computing/caching more next week
- After Jeff gets something working, I'll run the full example to compare against the original Marin TM2 example
Scaled integer skims
- We should think about what activitysim publishes as its expected input skims formats/assumptions
- For example, if we go to storing skims as 16bit unsigned ints, then the range is 0-65,000
- We'd need to scale float values by 100 too
- Can we do time in seconds with this data type? 60 * 60 * 18 = 64,800, so we can only handle up to 18 hours of seconds
- DaySim does this (see scaling here) and so that means it should be acceptable for travel models in general
- Reducing from float32 to int16 storage would mean half the memory usage for skims
- OMX project is discussing using Apache arrow for faster disk-based I/O
- Maybe we could do something similar for skims in ActivitySim? Maybe we don't need to load them into RAM.
- The existing activitysim reload skims from disk using memmap feature is similar
- Maybe ActivitySim adds a float_32 versus scaled_unsigned_int_16 skim setting?
- We want to remain unit agnostic though
- And maybe we could specify different internal data types by skim
- Freeing up more RAM by using leaner data types means more RAM for chunking and multiprocessing
- It's a requirement that activitysim can run the PSRC model - 12 time periods * 60 skims * 4000 zones
- This is a good chance to better define and publish the data model
- How much RAM for skims does the existing example use? We'll check
- We may want to use more efficient data types in expressions as well - say float32 instead of float64
- We'll review this as part of the performance task as well
- Jeff to work on this after he gets TVPB to a good place
- We want to do all our arithmetic in full precision though since some models have lots of choices with small utilities and probabilities
- The scaling/unscaling of skims will happen only in the skims API/class and not in the rest of the system so it is only a storage/RAM issue
- Here are some good relevant links on memory and performance from Stefan:
  - Info of why python uses more memory than some other languages
  - A new feature in Python 3.8 multiprocessing.shared_memory
Welcome Jeff Newman and CDAP larch integration
- Jeff to start on the CDAP larch integration
- Person types are encoded 1 to 8 and so only up to 10 person types is possible (0-9)
- Could be stored as strings (A, B, C) to expand the set of possible values
- This is fine for this version of activitysim; later versions may support more person types but let's not solve a problem we don't have
- SEMCOG HH IDs were very long, 14 digits, and ActivitySim couldn't read them. ActivitySim needs to do a better job of publishing expected data types/ranges
- Jeff to work off asim/develop in a new branch within the repo so everyone can more easily participate
- Clint can too for ARC if he wants

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Meeting 2020.10.13

Technical Call

ActivitySim

Clone this wiki locally