Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tune_grid() fails if parameter is not marked as tunable() despite matching grid #660

Open
hfrick opened this issue Apr 4, 2023 · 4 comments
Labels
feature a feature request or enhancement

Comments

@hfrick
Copy link
Member

hfrick commented Apr 4, 2023

In #633, we enabled tune to deal with list-columns in the grid, in order to allow people to use a custom grid to tune lower-level arguments.

The example there was based on wanting to tune parameters to a custom loss function that was passed to the objective engine argument for xgboost. When applying the same approach to a recipe step, it fails:

library(tidymodels)
data("credit_data", package = "modeldata")

set.seed(342)
credit_resamples <- vfold_cv(credit_data, v = 5)

rec <- recipe(Price ~ ., data = credit_data) %>%
  step_impute_bag(
    Status, Home, Marital, Job, Income, Assets, Debt,
    # instead of 
    # options = list(nbagg = tune())
    # do
    options = tune()
  )

credit_wflow <- workflow(rec, linear_reg())

# make grid manually
grid_options <- tibble(nbagg = seq(15, 30, by = 5)) %>% 
  mutate(options = map(nbagg, ~ list(nbagg = .))) %>% 
  select(options)

credit_res <- tune_grid(credit_wflow, credit_resamples, grid = grid_options)
#> Warning: No tuning parameters have been detected, performance will be evaluated
#> using the resamples with no tuning. Did you want to [tune()] parameters?
#> → A | error:   You cannot `prep()` a tuneable recipe. Argument(s) with `tune()`: 'options'. Do you want to use a tuning function such as `tune_grid()`?
#> There were issues with some computations   A: x1
#> There were issues with some computations   A: x5
#> 
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.

Created on 2023-04-04 with reprex v2.0.2

More details on why this fails are below but the higher-level question is: If you tag something for tuning and bring your own grid for it, shouldn't it work?

Currently, this traces back to tune_args() for recipes limiting the scope of checking for tune() tags to arguments which were designated for tuning via tunable().

https://github.com/tidymodels/recipes/blob/567fe17cddb2cb880bcd30daa7737effca065ed9/R/tune_args.R#L29-L32

We can bypass some previous checks by giving tune_grid() a decoy parameter set but because tune_args(), called here:

tune_tbl <- tune_args(workflow)

for this recipe is empty, this is where the journey currently ends.

tune_args(rec)
#> # A tibble: 0 × 6
#> # ℹ 6 variables: name <chr>, tunable <lgl>, id <chr>, source <chr>,
#> #   component <chr>, component_id <chr>

fake_pset <- parameters(list(options = penalty()))
credit_res <- tune_grid(credit_wflow, credit_resamples,
                        grid = grid_options, param_info = fake_pset)
#> Error in `check_grid()`:
#> ! The provided `grid` has the following parameter columns that have not been marked for tuning by `tune()`: 'options'.
#> Backtrace:
#>     ▆
#>  1. ├─tune::tune_grid(...)
#>  2. └─tune:::tune_grid.workflow(...)
#>  3.   └─tune:::tune_grid_workflow(...)
#>  4.     └─tune:::check_grid(grid = grid, workflow = workflow, pset = pset)
#>  5.       └─rlang::abort(msg)

Is this a matter of reworking the checks or the error messages? Or a bigger question of where do we accept tune() tags and how to deal with them?

@hfrick
Copy link
Member Author

hfrick commented Apr 4, 2023

The example is motivated by tidymodels/dials#154

@simonpcouch simonpcouch added the feature a feature request or enhancement label Oct 31, 2023
@simonpcouch
Copy link
Contributor

Wasn't sure whether to apply bug or feature, but either way, I agree that this ought to be fair game. :)

@simonpcouch
Copy link
Contributor

A bit more context from poking at this for a moment...

tune_args() methods are intended to return arguments marked for tuning, tunable() methods are intended to return arguments marked for tuning that we can associated dials parameter information with. In some places, tune_args() methods more closely resemble tunable() methods, making it difficult for tune to handle custom grids in a principled way. In theory, if a user provides their own grid, then we should be able to rely only on tune_args() methods when running tune_grid(). In that case, tune_grid() takes care of collecting and then injecting each needed values and recipes and/or parsnip never need to know they're handling tuning parameters.

So, step 1 is to disambiguate tune_args() and tunable() in implementations. :)

@abichat
Copy link

abichat commented Jul 15, 2024

Just to give another example with step_holiday(), whose holidays argument is not tunable.

library(tidyverse)
library(tidymodels)

examples <- data.frame(someday = ymd("2000-12-20") + days(0:40))

holiday_rec <- 
  recipe(~someday, examples) %>%
  step_holiday(all_predictors(), holidays = c("Easter", "ChristmasDay"))

There are 2^119 combinations of holidays (so make it tunable could be dangerous), but it would be nice if we could tune it based on a set of defined values, like this (as it's okay with list-columns now #633):

tibble(holidays = list(c("LaborDay", "NewYearsDay", "ChristmasDay"),
                       c("LaborDay", "NewYearsDay", "ChristmasDay", "Easter", "Annunciation"),
                       c("FRAllSaints", "FRBastilleDay", "FRAscension")))
# # A tibble: 3 × 1
#   holidays 
#   <list>   
# 1 <chr [3]>
# 2 <chr [5]>
# 3 <chr [3]>

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants