[1/2]: Read in case data #17

zsusswein · 2024-08-31T19:03:49Z

Important

Do not merge. Using a stacked PR workflow.

Read the case data from a local parquet file in our standard ETL output schema. The various errors & warnings are linked to the tests via classes.

This approach preserves our existing handling of the data with respect to exclusions: it aggregates before there have been any exclusions. Applying exclusions to the aggregated data will be handled in the second stack in this PR.

I also add in here the data from Gostic, 2020. I use it in the unit tests to practice reading in data from tests/testthat/data/ but I also add it as package data (with documentation). We use this dataset in our end-to-end testing, so its addition here is also meant with an eye to that future use.

codecov · 2024-08-31T23:39:13Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Additional details and impacted files

📢 Thoughts on this report? Let us know!

natemcintosh · 2024-09-03T20:10:04Z

So calling /document causes it to run the documentation CI for you?

zsusswein · 2024-09-03T20:11:19Z

Yes!

DESCRIPTION

R/read_data.R

data-raw/convert_gostic_toy_rt_to_test_dataset.R

athowes

Just superficial things! (I think reading this is helping me get up to speed nonetheless)

R/data.R

R/read_data.R

tests/testthat/test-read_data.R

athowes · 2024-09-05T08:35:57Z

This approach preserves our existing handling of the data with respect to exclusions: it aggregates before there have been any exclusions. Applying exclusions to the aggregated data will be handled in the second stack in this PR.

Just to note without context here that this sounds odd to me. If you aggregate data it'd seem to be harder to then retroactively go in and remove data points from it, rather than do the removing data first then to aggregate. I guess you must have a reason to be doing this. Since I guess now you'd have to be tracking somehow in the code all of the entries that went into a particular aggregate so that if you remove data from them you can remove data from the aggregate. (As in this seems more challenging than it need be.)

zsusswein · 2024-09-05T12:16:48Z

Yeah this is a good flag. I don't know if this repo is the right place to document it, but here's the gist of our ongoing discussion around data exclusions:

We receive and store facility-level data. These facilities are tagged with their corresponding state.
We want to produce state- and national-level estimates, so on-the-fly aggregate our stored data up to the desired level (this PR!)
We include right-truncation in reporting in the model. We provide a right-truncation PMF based on historical geographic-aggregate-level right-truncation.
Sometimes one geographic aggregate or another will have a one-off week where right-truncation empirically deviates from historical trend (e.g. 0 cases reported when we normally see ~some)
In those cases, we consult with the dataset expert and sometimes drop the single day with anomalous reporting for that particular geographic aggregate by converting the aggregate point to NA.
Sometimes the anomalous reporting is widespread enough that we judge that the modeling is not reflective of disease trends and exclude the whole state's model fit for the week.
This exclusion does not propagate to other geographic aggregates (e.g. the US overall aggregate) or future report dates. All data are included, even if a different aggregate using a portion of the same data is excluded.

With the key question being, should the NA propagate? When we exclude a state's point, should we also exclude it from the US overall? What about when we exclude a whole state's time series?

We've landed on "No, we should make judgement calls about each aggregated time series individually." But this discussion is still live and we're trying to get to a better place from both the data review/cleaning side and the modeling side.

R/data.R

kgostic

Overall looks good!

R/data.R

R/read_data.R

R/utils.R

R/data.R

Co-authored-by: Adam Howes <[email protected]>

Co-authored-by: Katie Gostic (she/her) <[email protected]>

@athowes

* Add simulated data from Gostic, 2020 for benchmarking This commit re-uses the data processing and documentation from CDCgov/cfa-epinow2-pipeline#17. That repo is open source and public domain by nature of being USG property. I think it's convenient to re-use, but @athowes and @kgostic there's a little text in there that you both suggested, so I'd appreciate if you could give your permission for the re-use here! I've made sure to credit you both in the commit as it's partially your writing. It's probably best practice to make the data prep and processing here fully reproducible in `data-raw/` but it's quite a pain to do, with a mixture of shells cripting and R scripting needed. I skipped it out of convenience, but if you have a really strong feeling that it's required @seabbs, let me know and we can revisit. Closes #9 Co-authored-by: Adam Howes <[email protected]> Co-authored-by: Katie Gostic <[email protected]> * Bump NEWS * Coerce new `obs_incidence` col to integer from double * Generate GI PMF in `data-raw/` * Point to specific CFAEpiNow2Pipeline commit * Remove copied & pasted unrelated text * Re-render roxygen docs * Add primarycensoreddist to remotes * Typo * Rename synthetic dataset and GI dist * Pin primarycensorreddist to R-universe not github --------- Co-authored-by: Adam Howes <[email protected]> Co-authored-by: Katie Gostic <[email protected]>

zsusswein force-pushed the zs-read-data branch from d53ba6e to ee26378 Compare August 31, 2024 23:37

zsusswein force-pushed the zs-read-data branch 3 times, most recently from 602ed84 to 767db01 Compare September 2, 2024 11:40

This comment was marked as resolved.

Sign in to view

zsusswein force-pushed the zs-read-data branch from 7c5d755 to e01a94d Compare September 3, 2024 21:28

zsusswein changed the title ~~Read in case data~~ [1/2]: Read in case data Sep 3, 2024

zsusswein requested review from kgostic, athowes and natemcintosh September 3, 2024 21:37

zsusswein added the v0.1.0 label Sep 3, 2024

zsusswein marked this pull request as ready for review September 3, 2024 21:42

zsusswein force-pushed the zs-read-data branch from a26030d to 0844ae0 Compare September 3, 2024 21:55

This comment was marked as resolved.

Sign in to view

zsusswein force-pushed the zs-read-data branch from a1da73b to 5fc4349 Compare September 4, 2024 15:45

natemcintosh approved these changes Sep 4, 2024

View reviewed changes

zsusswein force-pushed the zs-read-data branch from 2a8b870 to f7f8567 Compare September 4, 2024 20:45

athowes reviewed Sep 5, 2024

View reviewed changes

zsusswein requested a review from athowes September 5, 2024 13:58

zsusswein commented Sep 5, 2024

View reviewed changes

R/data.R Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

kgostic approved these changes Sep 5, 2024

View reviewed changes

R/data.R Outdated Show resolved Hide resolved

R/data.R Outdated Show resolved Hide resolved

R/data.R Outdated Show resolved Hide resolved

R/data.R Outdated Show resolved Hide resolved

R/read_data.R Show resolved Hide resolved

R/utils.R Outdated Show resolved Hide resolved

zsusswein commented Sep 5, 2024

View reviewed changes

R/data.R Outdated Show resolved Hide resolved

zsusswein and others added 26 commits September 10, 2024 11:34

Update data.R

60b25f2

Co-authored-by: Adam Howes <[email protected]>

Update data.R

1617ca0

Co-authored-by: Adam Howes <[email protected]>

Update data.R

5eee1fe

Co-authored-by: Adam Howes <[email protected]>

Update data.R

4533a15

Co-authored-by: Adam Howes <[email protected]>

Update test-read_data.R

54851dc

Co-authored-by: Adam Howes <[email protected]>

Update test-read_data.R

cc171b0

Co-authored-by: Adam Howes <[email protected]>

Update R/data.R

f573163

Syntax and formatting of roxygen string

434b193

Reformat list to render as a formatted list

d1e09c3

Typo

08e357d

Document

6e9f4d2

Update R/data.R

b994606

Co-authored-by: Katie Gostic (she/her) <[email protected]>

Apply suggestions from code review

74f5af1

Co-authored-by: Katie Gostic (she/her) <[email protected]>

Delete duplicate definition of stringify_date()

39d7678

Fix lint failures from added text

40c76e4

Update R/data.R

a2e2e0f

Style and render edits from review

a3c3fc1

Add disease to case data df for join w/ exclusions

b257239

Move file existence check to a utility

bc58d9a

Exclusion file reader and application to cases df

8f7f1f6

Bump NEWS

be895df

Document

3fe1822

Swap and to where

6f62fd0

func with parens

362add1

Clarify language

a1e90b6

Add some flavor text on exclusions

5f8cd56

zsusswein force-pushed the zs-read-data branch from 717858a to 5f8cd56 Compare September 10, 2024 15:34

zsusswein merged commit a212f50 into main Sep 10, 2024
6 checks passed

zsusswein deleted the zs-read-data branch September 10, 2024 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/2]: Read in case data #17

[1/2]: Read in case data #17

zsusswein commented Aug 31, 2024 •

edited

Loading

codecov bot commented Aug 31, 2024 •

edited

Loading

This comment was marked as resolved.

natemcintosh commented Sep 3, 2024

zsusswein commented Sep 3, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

athowes left a comment

athowes commented Sep 5, 2024

zsusswein commented Sep 5, 2024 •

edited

Loading

This comment was marked as resolved.

kgostic left a comment

[1/2]: Read in case data #17

[1/2]: Read in case data #17

Conversation

zsusswein commented Aug 31, 2024 • edited Loading

codecov bot commented Aug 31, 2024 • edited Loading

Codecov Report

This comment was marked as resolved.

natemcintosh commented Sep 3, 2024

zsusswein commented Sep 3, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

athowes left a comment

Choose a reason for hiding this comment

athowes commented Sep 5, 2024

zsusswein commented Sep 5, 2024 • edited Loading

This comment was marked as resolved.

kgostic left a comment

Choose a reason for hiding this comment

zsusswein commented Aug 31, 2024 •

edited

Loading

codecov bot commented Aug 31, 2024 •

edited

Loading

zsusswein commented Sep 5, 2024 •

edited

Loading