Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to iterate over categorical variable values and make new data sets #36

Open
atredennick opened this issue Dec 8, 2023 · 7 comments

Comments

@atredennick
Copy link

Not sure if this is too "inside baseball" to be relevant to a wider community. Also not sure if the best place for this is pmforest or yspec. For making forest plots, we generally are simulating from a fitted model given new data sets where one variable is changed per iteration. A function that takes in a spec and vector of variables, scans all possible values, and generates new datasets for reference and univariate perturbations would be extremely helpful.

@barrettk
Copy link
Collaborator

Hey @atredennick, just wondering if you had any scripts or illustrations of doing some of what you're asking for? We could definitely do something like that, but will talk to some other TS folks to see where something like this should live

@atredennick
Copy link
Author

atredennick commented Dec 12, 2023

Not sure if this is super helpful out of context, but here is what I cobbled together recently:

process_cats <- function(.cats, .conts, .spec = spec) {
  cat_codes <- tibble(variable = .cats) %>%
    mutate(decodes = map(.x = variable, .f = ~ get_values(.var = .x))) %>%
    unnest(cols = decodes) %>%
    filter(code != "Missing") %>%
    group_by(variable) %>%
    arrange(variable, value) %>%
    mutate(case = case_when(
      value == min(value) ~ "reference",
      TRUE ~ "perturbation"
    )) %>%
    ungroup()
  
  ref_cats <- cat_codes %>%
    filter(case == "reference") %>%
    dplyr::select(-case) %>%
    nest(ref_df = c(variable, value, code))
  
  pert_cats <- cat_codes %>%
    filter(case == "perturbation") %>%
    dplyr::select(-case) %>%
    mutate(pert_value = value) %>%
    nest(pert_df = c(value, code))
  
  cat_dfs <- pert_cats %>%
    crossing(ref_cats)
  
  # need to remove rows where variable is TRT2 and pert_value is 1
  # because this is never seen in the training data
  cat_dfs <- cat_dfs %>%
    filter(!(variable == "TRT2" & pert_value == 1))
  
  return(cat_dfs)
}

Followed by this function:

set_perturbations <- function(.var, .ref, .pert) {
  new_row <- tibble(variable = .var, value = .pert$value, code = .pert$code)
  out <- .ref %>%
    filter(variable != .var) %>%
    bind_rows(new_row) %>%
    arrange(variable) %>%
    dplyr::select(-code) %>%
    pivot_wider(names_from = variable, values_from = value)
  return(out)
}

@barrettk
Copy link
Collaborator

@atredennick thanks so much for putting that together! Seth proposed the idea of making a PR on the example project, to show how it could look there first. If you're able to do that let me know!

@atredennick
Copy link
Author

Sounds good! Will do. (Might take a few days).

@barrettk
Copy link
Collaborator

barrettk commented Jan 4, 2024

Hey @atredennick, just wanted to check back in to see if you had any status updates?

@atredennick
Copy link
Author

Actually, for a recent project, Todd has developed some functions for this.

@barrettk
Copy link
Collaborator

@atredennick Is this internal function sufficient, or do you think a package function would still be ideal? Would love to look at it if it can be ported over easily!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants