Caching Agent Populations (Cholera)

Caching

It might be valuable in LASER to be able to easily save and reload a full population. We do this in EMOD under the nomenclature of serialization and deserialization. Since our entire agent population in LASER is just a dataframe -- though admittedly a big one -- it should really not be that hard.

This page documents an initial approach that is checked in here. This is part of the idmlaser_cholera prototype.

Summary

Full population of 100m agents saved in an hdf5 file.
In directory "./laser_cache".
Saving done via save function in LaserFrame class.
User currently modifies code to manually do a save at a desired time in an ad-hoc manner.
Population is saved with some associated metadata which is designed to help discovery-and-loading process tell if the saved file matches the current set of inputs, using HDF5 attributes. Those inputs include:
- input population (total initial population for each node)
- age distribution
- cumulative deaths (natural mortality)
- eula age (not applicable to Cholera)
Input config params are not currently saved in the associated population metadata.
Discovery and reload is done by searching through all the files with the correct suffix (.h5) in the correct directory (laser_cache) and loading the first one whose meta-parameters match the ones in the current simulation.
Caching is famously one of the "hard things" in software, so we'll ultimately want to revisit this a bit to make sure a user never uses a cached population file when they didn't intend to.

Performance

The caching lets one skip some of the startup steps. This has been shown to shave 15-30 seconds off a simulation. The actual reading and writing of 2GB HDF5 files is shockingly fast.

COMPS

There is an open question as to whether saved population files should be assets or files stored on a fileshare.

Code

It was necessary to refactor the startup code a bit because some setup code is not applicable if one is loading a full population file.

One aspect of LASER agent population is the "capacity" for expansion from future births. We do not save/load these yet-to-be-modeled pseudo-agents. We reset the capacity after reloading.

We also do not save the node LaserFrames (e.g, for reporting), but rather recreate those for a new simulation.

Design Choices

HDF5

This file format seems to work very nicely for our purposes. It's fast, flexible in terms of adding metadata, easy to convert to CSV, and it's easy to get ChatGPT to produce code that works with it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly