Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for count data? #23

Open
seabbs opened this issue Sep 12, 2024 · 9 comments
Open

Add support for count data? #23

seabbs opened this issue Sep 12, 2024 · 9 comments

Comments

@seabbs
Copy link

seabbs commented Sep 12, 2024

Any interest in doing this and view as to the complexity? I am thinking a simple time-varying ascertainment model from the global infections time series with the more complex version support n count time series at once.

@seabbs seabbs changed the title Add support for a count data? Add support for count data? Sep 12, 2024
@adrian-lison
Copy link
Owner

I wonder if a "simple" version would do any good. If we don't account for right truncation, it would only be okay for date-of-report data. Maybe the question is also how much potential there is for merging the model used in EpiSewer --> epinowcast in the long-term...

General downside I see is that while it would mostly be a separate module, having the case data increases the complexity in terms of what signals influence the estimated infection trajectory. Not a problem per se, but can make it harder to detect and diagnose problems.

@seabbs
Copy link
Author

seabbs commented Sep 16, 2024

Maybe the question is also how much potential there is for merging the model used in EpiSewer --> epinowcast in the long-term...

Well yes precisely this would be some action along those lines.

If we don't account for right truncation, it would only be okay for date-of-report data.

I agree but in the first instance can just allow for missing data support and NA partial reports I think.

having the case data increases the complexity in terms of what signals influence the estimated infection trajectory. Not a problem per se, but can make it harder to detect and diagnose problems.

Yes, agree. I think the way this is set up it makes sense to declare WW the ground truth and try and capture as much of the difference between underlying infections and the count target as ascertainment.

I don't have a direct use for this but it could be interesting as a deployable forecasting model i.e in the US flusight to compare to the increasingly mechanistic approach @kaitejohnson and co are taking.

@kaitejohnson
Copy link

I think for this to work for a deployable forecasting model, what you would probably need is support for n wastewater data streams rather than n count data streams (this being a maybe US biased assumption that most of the time forecasting targets are larger geographic granularities than wastewater catchment areas).

Yes, agree. I think the way this is set up it makes sense to declare WW the ground truth and try and capture as much of the difference between underlying infections and the count target as ascertainment.

Interesting, so here you would propose something where the ascertainment over time is weak enough that the R(t) time series is largely driven by the trend in wastewater. I think it would be interesting to tune that parameter based on forecast evaluation, in part because my intuition is trends in wastewater are more variable than trends in ascertainment.

I think in the first pass, it would still be really useful to be able to support one count time series, one wastewater time series, assuming the same source population.

@seabbs
Copy link
Author

seabbs commented Sep 16, 2024

what you would probably need is support for n wastewater data streams rather than n count data streams

See #22

this being a maybe US biased assumption that most of the time forecasting targets are larger geographic granularities than wastewater catchment areas).

I think this is probably a bit context specific. Here for example we might have one ww site but multiple NNH hospitals all with data streams.

Interesting, so here you would propose something where the ascertainment over time is weak enough that the R(t) time series is largely driven by the trend in wastewater.

Yes as this model is really a ground truth WW model IMO.

I think in the first pass, it would still be really useful to be able to support one count time series, one wastewater time series, assuming the same source population.

I agree though I would push back on any attempt to do much with the idea of a overlapping population as I think it is better not to enforce that mechanism here.

@kaitejohnson
Copy link

I agree though I would push back on any attempt to do much with the idea of a overlapping population as I think it is better not to enforce that mechanism here.

Wouldn't generating count data from the infection time series enforce that assumption?

@seabbs
Copy link
Author

seabbs commented Sep 16, 2024

Not if you allow it to be a subset

@adrian-lison
Copy link
Owner

Not if you allow it to be a subset

Yes, what is anyways needed is basic support for letting the catchment population be a subset of the infection population (currently they are assumed to be identical). The same logic would apply for case ascertainment.

@adrian-lison
Copy link
Owner

Interesting, so here you would propose something where the ascertainment over time is weak enough that the R(t) time series is largely driven by the trend in wastewater. I think it would be interesting to tune that parameter based on forecast evaluation, in part because my intuition is trends in wastewater are more variable than trends in ascertainment.

@kaitejohnson Interesting point. By parameter, do you mean something like putting a strong prior on the variance of observations, or something more complex like a time-varying reporting error process that serves as an "isolation layer" separating the case data from the Rt trend?

I was also wondering if there would be any value of having a stepwise procedure which first estimates Rt and the infection trajectory solely from wastewater, and then estimates ascertainment parameters from the case data and predicts future cases using a fixed infection trajectory (e.g. the median trajectory, but it is also possible to do this over several posterior samples via likelihood averaging).

@kaitejohnson
Copy link

I think I mean the latter: you would allow a time-varying ascertainment rate that was essentially able to vary widely day to day and had very little magnitude constraints, which would allow any infection trend to generate the observed count data... I don't think this really makes sense though because why not then just estimate it post-hoc (after estimating infection trend from ww in current model).

I was also wondering if there would be any value of having a stepwise procedure which first estimates Rt and the infection trajectory solely from wastewater, and then estimates ascertainment parameters from the case data and predicts future cases using a fixed infection trajectory (e.g. the median trajectory, but it is also possible to do this over several posterior samples via likelihood averaging).

Right, I think this is what would make more sense than doing something that looks like a joint inference but as programmed with the priors is not. Could predict future cases by propagating the computed variability in the ascertainment rate or something (though what you're suggesting sounds more sophisticated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants