Capturing vector transformation parameters #127

realauggieheschmeyer · 2022-07-27T15:48:43Z

Both log_interval_vec() and standardize_vec() will print the auto-detected parameters used to scale the target variable.

For example:

log_interval_vec(): 
 Using limit_lower: 0
 Using limit_upper: 12
 Using offset: 1
 
Standardization Parameters
mean: -3.0500341071016
standard deviation: 1.22764358571979

However, there is currently no native way to capture these parameters outside of reviewing the printed text and manually saving the information. This isn't a problem for one-off analyses but prevents one from using these functions as part of an automated forecasting workflow. The target variable can be scaled automatically but without being able to store and access the parameters later, any predictions on the new variable can not be transformed back to the original scale without human intervention.

It would be nice to have some helper function that can be run prior to mutating your target variable to extract the relevant parameters and save them for later in the workflow.

Below is the code I wrote to capture these parameters manually:

log_params <- ticket_volume_pad_tbl %>% 
  group_by(department, ticket_type) %>% 
  summarize(
    limit_lower = 0,
    limit_upper = (max(tickets) * 1.1) + 1,
    .groups = "drop"
  )

standardization_params <- ticket_volume_pad_tbl %>% 
  left_join(log_params, by = c("department", "ticket_type")) %>% 
  mutate(
    tickets_scaled = log(((tickets + 1) - limit_lower) / (limit_upper - (tickets + 1)))
  ) %>% 
  group_by(department, ticket_type) %>% 
  summarize(
    mean = mean(tickets_scaled),
    standard_deviation = sd(tickets_scaled),
    .groups = "drop"
  )

log_params %>% 
  left_join(standardization_params, by = c("department", "ticket_type"))

If it's helpful, I can try my hand at converting the above into a function but I'd love some guidance on how to style it appropriately within the existing timetk functions.

The text was updated successfully, but these errors were encountered:

spsanderson · 2022-07-27T15:59:13Z

I did something similar but it was strictly for my own use, see here:

https://github.com/spsanderson/healthyverse_tsa/blob/master/00_scripts/data_manipulation_functions.R

realauggieheschmeyer · 2022-07-27T16:02:35Z

In addition to automated workflows, the manual nature of this process would also be problematic if you had a large number of groups in your data. Just imagine trying to forecast retail SKUs and having to manually log hundreds or thousands of parameters 😰

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capturing vector transformation parameters #127

Capturing vector transformation parameters #127

realauggieheschmeyer commented Jul 27, 2022

spsanderson commented Jul 27, 2022

realauggieheschmeyer commented Jul 27, 2022

Capturing vector transformation parameters #127

Capturing vector transformation parameters #127

Comments

realauggieheschmeyer commented Jul 27, 2022

spsanderson commented Jul 27, 2022

realauggieheschmeyer commented Jul 27, 2022