Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Fix xgb.cv() for AFT models #9863

Closed
wants to merge 2 commits into from
Closed

[R] Fix xgb.cv() for AFT models #9863

wants to merge 2 commits into from

Conversation

mayer79
Copy link
Contributor

@mayer79 mayer79 commented Dec 8, 2023

This PR fixes #7118

In order to keep the xgb.cv() API as it is:

  • data must be an xgb.DMatrix object
  • which contains the two infos 'label_lower_bound' and 'label_upper_bound'.

Automatic stratified splitting is deactivated with a warning. @david-cortes this is something we need to keep in mind for multioutput regressions as well.

@mayer79 mayer79 changed the title Fix xgb.cv() for AFT models [R] Fix xgb.cv() for AFT models Dec 8, 2023
if (!inherits(data, 'xgb.DMatrix')) {
stop("Objective 'survival:aft' requires the data to be an 'xgb.DMatrix'.")
}
if (is.null(getinfo(data, name = 'label_lower_bound')) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I am wondering: most of these getinfo used throughout the function are only checking whether the DMatrix contains that field or not, but each call involves creating a full data copy just to check that it exists. @trivialfis Is there perhaps some function that could be used to just check if the DMatrix has the field or not? Do functions like XGDMatrixGetFloatInfo actually create a data copy when calling them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it creates a data copy.

Hmm.. my personal preference is still to try not to use the DMatrix as a data manipulation class. Existing libraries and R builtins can do a much better job than we can ever try.

@david-cortes
Copy link
Contributor

Thanks for looking into this. Left a small comment.

Although I am also thinking: if it constructs a DMatrix directly from data and labels - wouldn't it also require just a few lines to allow such a transformation with bounds as function arguments?

@mayer79
Copy link
Contributor Author

mayer79 commented Dec 8, 2023

We could add more arguments to xgb.cv(). I was actually thinking in the other direction: let the function work only with xgb.DMatrix input and remove the "label" argument. Why? The current signature of xgb.cv() is very incomplete. For instance there is no "weight" argument.

@david-cortes
Copy link
Contributor

We could add more arguments to xgb.cv(). I was actually thinking in the other direction: let the function work only with xgb.DMatrix input and remove the "label" argument. Why? The current signature of xgb.cv() is very incomplete. For instance there is no "weight" argument.

Yes, that makes sense too - then we wouldn't need to update things in two places if new parameters come out.

@trivialfis
Copy link
Member

cc @hcho3 for aft.

@david-cortes
Copy link
Contributor

There's now a function xgb.DMatrix.hasinfo which could be used here to check whether the DMatrix has a given field without involving data copies:

xgb.DMatrix.hasinfo <- function(object, info) {

@mayer79
Copy link
Contributor Author

mayer79 commented Mar 31, 2024

Closed in favour of #10031

@mayer79 mayer79 closed this Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Survival AFT analysis using xgb.cv
3 participants