Skip to content

Calibration: Early Experiments

Jonathan Bloedow edited this page Jun 6, 2024 · 6 revisions

Purpose

This post captures some details and experiences doing calibrations with Optuna and Jonathan's prototype model. This work is entirely "Dev Work" and not Research, as a technology demonstrator to show how the model can be integrated into a popular calibration tool and run against our primary reference scenario.

Experiments

We used the famous England & Wales measles dataset from the 1950s as our scenario. The software model was the late May version of Jonathan's prototype. The calibration tool used everywhere here was Optuna. We did 2 overall calibrations: Single-node London-ish, to explore CCS (Critical Community Size) and Seasonality, including output periodicity, and a Full 954 node spatial model.

Setup

The model was run as a webservice -- initially running locally, and then deployed to k8s using Tanzu. So far it's just been a single simulation "trial" at a time. We've configured Optuna to store results in a local SQLite db.

The webservice is built using Flask. It runs inside a docker container. The docker container is run locally using docker-compose or remotely using kubernetes. The remote machine available to me had 16 cores whereas my local had fewer.

The conversion of raw output to calibration metric is done in a post-processing step at the end of the sim -- in the container -- and just the set of metrics (keys and values) are returned back to Optuna. A very simple objective function is used to measure the error or delta.

Client code: https://github.com/jonathanhhb/laser/blob/as-a-service/sql_modeling/client/calibrate.py

Experiment 1: Single Node (London-ish)

Summary

In this calibration, we just modeled London, by far the largest city in the model -- with ~2.5e6 souls. This let us removed migration entirely from the problem domain and focus on fewer inputs and outputs, and run faster simulations. Our simulations ran for 20 years. The first few years were deemed burnin.

This scenario is fairly interesting because for combinations of values that are too low, we don't get endemicity (post burn-in). And when we do get endemicity, some input value sets lead to annual periodicity, some biennial. It's hard to know a priori what this surface is going to look like.

Note that the goal in these experiments was to find "some input values that worked", not a set of values or any probability distributions.

Parameters

  • Base Infectivity
  • Seasonal Multiplier

Targets

  • mean new infections per year (London): 52,000
  • max wavelet power per period: 1 and 2 years

Results

We actually ran this twice: once calibrating to annual periodicity and once with biennial. Even though this was run back in early/mid May with a slower version of the prototype, the single-node sims ran in 60 seconds, so we were able to run 550 overnight. Both targets seemed to be achievable, with different input values. It seemed to point to Base Infectivity values of ~3.19 and a Seasonal Multiplier ~0.93.

Experiment 2: Full Spatial (954 nodes)

Summary

Here we move from single node to all 954 nodes. We keep the values we got from the London-ish calibration, though still allowing a narrow range for them, and add a single new migration parameter and several new spatial-related targets.

Parameters

  • Base Infectivity
  • Seasonal Multiplier
  • Migration Fraction (daily fraction of infectious people who migrate)

Targets

  • mean new infections per year (all): 666,200
  • mean new infections per year (London): 52,000
  • mean CCS fraction for big cities: 0.01
  • CCS median fraction: 0.644
  • slope of sigmoid fit to plot of "fraction of weeks without cases vs pop size": -1.78

The last 3 values are all related to the scatter plot of "fraction of weeks without cases vs log pop size" (TBD).

Results

Trial 100 finished with value: 0.11900160945919191 and parameters: {'base_infectivity': 2.397512282717682, 'migration_fraction': 0.029265491294019837, 'seasonal_multiplier': 0.5683085044755576}. Best is trial 100 with value: 0.11900160945919191.

(Sometimes you get your answer in the last place you look!)

Each simulation completes in about 3 minutes. We ran 100 trials. It seemed to produce acceptable results. Unfortunately I haven't figured out how to get plots from Optuna even though it's supposed to be easily available, so no eye candy yet.

Clone this wiki locally