Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow user to modify model prediction results #2

Merged
merged 3 commits into from
Dec 8, 2022
Merged

Conversation

mikoontz
Copy link
Owner

@mikoontz mikoontz commented Dec 8, 2022

This PR addresses some of the need that arose in bips-hb#10. In some use cases (not sure how common!), a user may need to modify the truth, response, or prob values output generated by the predict_learner() function. This patch introduces two new arguments to the cpi() function (which get passed to the compute_loss() function): modify_trp and ....

modify_trp ("trp" stands for truth/response/prob) is by default FALSE (leave the truth/response/prob output alone) but can also take a user-defined function that accepts as arguments the original truth, response, and prob values (after the modification that takes place in the beginning of the compute_loss() function) as well as the ... argument to allow for some flexibility. It must return a named list (names must be truth, response, and prob) with the updated values. If values don't need updating, a user can simply pass the original value to the corresponding list item in the returned object.

Some potential use cases (which I've added examples for in the docs):

  1. A user wants to use a different classification threshold than 0.5 for choosing which level to classify a particular observation
  2. A user is building a random forest regression on a 0/1 binary outcome in order to use an unbiased split selection criteria (e.g., the maxstat rule, which requires a regression approach not classification; further discussion here). That user then still wants to use classification-like loss measures and therefore needs to create a prob object from the response object, and to update the response object to represent the predicted class.
  3. A user wants to rescale the probability output (as in consequences of logloss as loss measure for binary classification when optimal classification threshold != 0.5 bips-hb/cpi#10)

It strikes me that I can imagine the modify_trp and ... arguments being passed to the predict_learner() function instead of compute_loss(), but the machinery at the start of compute_loss() makes the truth, response, and prob objects a little easier to work with (because they're made consistent regardless of whether inherits(pred, "Prediction") is TRUE or FALSE). But this block could also move to the end of predict_learner() to make the pred output consistent at this stage?

if (inherits(pred, "Prediction")) {
    truth <- pred$truth
    response <- pred$response
    prob <- pred$prob
  } else {
    truth <- do.call(c, lapply(pred, function(x) x$truth))
    response <- do.call(c, lapply(pred, function(x) x$response))
    prob <- do.call(rbind, lapply(pred, function(x) x$prob))
  }

Just a thought on organization (keeping all the prediction modification within predict_learner() instead of spread across predict_learner() and compute_loss()) to discard if not useful!

I think this approach allows maximum flexibility for a user, but I'm also happy to contribute to further discussion on other ways to implement something like this (if it might be implemented at all).

All checks pass and I added some worked examples to the documentation.

Thanks again for the great package!

…utputs from predict_learner() output. This gives the user more flexibility over the information being used to ultimately derive CPI.
@mikoontz mikoontz merged commit 4a10122 into dev Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant