add initial 'prioritize_dt' helper function #32

chacalle · 2021-03-29T18:26:06Z

Describe changes

Adds initial prioritization/deduplication function that we can use internally to prioritize different reports of values from the same data collection origin.

This is based off of the population data prioritization code. Are there other scenarios that wouldn't be handled by this function? See function example and tests for example scenarios.

What issues are related

Fixes #1

Checklist

Packages Repositories

Have you read the contributing guidelines for ihmeuw-demographics R packages?
Have you successfully run devtools::check() locally?
Have you updated or added function (and vignette if applicable) documentation? Did you update the 'man' and 'NAMESPACE' files with devtools::document()?
Have you added in tests for the changes included in the PR?
Do the changes follow the ihmeuw-demographics code style?
Do the changes need to be immediately included in a new build of docker-base or docker-internal? If so follow directions in those repositories to rebuild and redeploy the images.
Do the changes require updates to other repositories which use this package? If yes, make the necessary updates in those repos, and consider integration tests for those repositories.

R/prioritize_dt.R

hcomfo95 · 2021-03-30T17:39:28Z

tests/testthat/test-prioritize_dt.R

+    quiet = TRUE
+  )
+  testthat::expect_identical(output_dt, expected_dt)
+})


Similar to Katie's suggestions above, would it be helpful to add a test where there are two sources with the same method for the same year to make sure it errors out how we would expect?

Agreed, I think what you're describing is what I added on line 40. Including the details below and let me know if you think this doesn't test that scenario.

where the test input has two separate reports with two methods each but the rank_order only includes prioritization of method which leads to duplicate prioritization of

> input_dt[year == 2000 & age_start == 0] location year age_start report method value 1: USA 2000 0 2015 A 1 2: USA 2000 0 2015 B 1 3: USA 2000 0 2020 A 1 4: USA 2000 0 2020 B 1 > rank_order["method"] $method [1] "B" "A"

The error looks like

> prioritize_dt( + dt = input_dt, + rank_by_cols = c("location", "year"), + unique_id_cols = c("location", "year", "age_start"), + rank_order = rank_order["method"] + ) Error in prioritize_dt(dt = input_dt, rank_by_cols = c("location", "year"), : Specified `rank_by_cols`, `rank_order` & returned `priority` do not uniquely identify each row of `dt`. - use `warn_non_unique_priority=TRUE` to return `dt` and run demUtils::identify_non_unique_dt with `id_cols = c('location', 'year', 'age_start', 'priority')` location year age_start priority 1: USA 2000 0 1 2: USA 2000 0 2 3: USA 2000 5 1 4: USA 2000 5 2

If we do the same call with warn_non_unique_priority = TRUE, the output looks below where the same priority is assigned to the different reports of the same method.

location year age_start report method value priority 1: USA 2000 0 2015 A 1 2 2: USA 2000 0 2015 B 1 1 3: USA 2000 0 2020 A 1 2 4: USA 2000 0 2020 B 1 1

add initial 'prioritize_dt' helper function

8193456

chacalle added the enhancement New feature or request label Mar 29, 2021

chacalle requested review from meghanfrisch, krpaulson and hcomfo95 March 29, 2021 18:26

chacalle self-assigned this Mar 29, 2021

krpaulson previously approved these changes Mar 29, 2021

View reviewed changes

R/prioritize_dt.R Outdated Show resolved Hide resolved

R/prioritize_dt.R Show resolved Hide resolved

R/prioritize_dt.R Show resolved Hide resolved

R/prioritize_dt.R Show resolved Hide resolved

clarify 'rank_order' documentation

2e8a93d

chacalle dismissed krpaulson’s stale review via 2e8a93d March 29, 2021 20:57

Add 'unique_id_cols' argument to help with assertions

ff96453

hcomfo95 reviewed Mar 30, 2021

View reviewed changes

chacalle merged commit 86b7903 into master Apr 2, 2021

chacalle deleted the feature/prioritize_dt branch April 2, 2021 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add initial 'prioritize_dt' helper function #32

add initial 'prioritize_dt' helper function #32

chacalle commented Mar 29, 2021

hcomfo95 Mar 30, 2021

chacalle Mar 31, 2021

add initial 'prioritize_dt' helper function #32

add initial 'prioritize_dt' helper function #32

Conversation

chacalle commented Mar 29, 2021

Describe changes

What issues are related

Checklist

Packages Repositories

hcomfo95 Mar 30, 2021

Choose a reason for hiding this comment

chacalle Mar 31, 2021

Choose a reason for hiding this comment