Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuning a graph learner with missing values in target variable #566

Closed
py9mrg opened this issue Feb 19, 2021 · 1 comment
Closed

Tuning a graph learner with missing values in target variable #566

py9mrg opened this issue Feb 19, 2021 · 1 comment

Comments

@py9mrg
Copy link

py9mrg commented Feb 19, 2021

Hello,

Not sure if this is best here or in mlr3tuning?

So, what I want to do is tune a graph learner where my dataset contains missing values in both predictor and target variables. To do this I start by imputing the dataset, and then piping this to a learner. My problems comes when there are missing values in the target variable. The imputer does not impute these (which is good), but then the tuning throws an error when the imputed stage is piped to the learner - even if the learner should cope with NAs.

I can get around this by removing all samples where the target variable is missing first, but I want to avoid that because I get much better results if I impute with these samples, and then drop them after imputation. I can do this manually but can't seem to get it to work within a graph learner. Is there a pipe operator for dropping NAs in the target variable that I could put between the imputation and learner? I can't seem to find one.

Here's a reprex to highlight the point:

library(mlr3verse)

# data with no missing values
data <- tibble::tibble(variable1 = 1:100, variable2 = 1:100, target = variable1^2 + variable2^2)

# data with missing target values
# first two samples are missing predictor values, 3rd sample is missing the target.
# if I exclude the 3rd sample everything is fine,
# but I want to include this sample in the imputation stage because
# the information this sample provides leads to better imputation
# hence better overall results so I only want to drop it after the imputation
data_w_missing <- data
data_w_missing[1, 1] <- NA_integer_
data_w_missing[2, 2] <- NA_integer_
data_w_missing[3, 3] <- NA_integer_

task <- TaskRegr$new(id = "test1", backend = data, target = "target")
task_w_missing <- TaskRegr$new(id = "test2", backend = data_w_missing, target = "target")

# even if I explicitly set the svm argument , na.action = "na.omit"
# I still get an error during tuning
# it's the default option anyway, as is the type, but this needs to be
# set explicitly because it's the parent of cost being tuned
graph <- po("imputehist") %>>%
  po(lrn("regr.svm", type = "eps-regression")) 
graph$plot()

graph_learner <- GraphLearner$new(graph)

search_space = ps(
  regr.svm.cost = p_dbl(lower = 0.1, upper = 1)
)

tuner <- tnr("grid_search", resolution = 3)

at = AutoTuner$new(
  learner = graph_learner,
  resampling = rsmp("cv", folds = 3),
  measure = msr("regr.rmse"),
  search_space = search_space,
  terminator = trm("none"),
  tuner = tuner
)

# tuning on the complete set works fine 
at$train(task)

# tuning on the data with the target missing throws an error
# note, if the NAs are only in the predictors then no error
# the error only comes when the target contains missing
at$train(task_w_missing)

# error message:

# Error in assert_regr(truth, response = response) : 
#   Assertion on 'truth' failed: Contains missing values (element 3).
# In addition: Warning messages:
# 1: In yorig - ret$fitted :
#   longer object length is not a multiple of shorter object length
# 2: In yorig - ret$fitted :
#   longer object length is not a multiple of shorter object length

# EDIT: just realised the issue is caused by svm returning a smaller set of predictions than in the data
@py9mrg
Copy link
Author

py9mrg commented Feb 19, 2021

Just realised this issue is already covered by #410, I think, so closing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant