-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimal intercept initialization for simple objectives #10298
Conversation
Thank you for working on this! I will look into the changes. |
eaa8453
to
70ac561
Compare
Hi @david-cortes , I have added sample mean and weighted sample mean functions. However, I removed the new inv link zero function since we are moving away from the binary model and will be dropping the capability of saving the binary model in this release, I don't think it's necessary to add new behavior in the codebase to workaround it.
We will work on this in a different PR. At the moment, the custom objective is quite primitive and there are lots of work we need to do to make it closer to the builtin objectives in terms of functionality.
It's gradient boosting, we can always stack more models on top instead of making one model perfect. |
Please let me know if you want me to take over from here. |
@trivialfis Thanks for looking into this. Since the binary format will be dropped, would be better to do this more correctly then, by making the changes deeper:
So yes, please take over. By the way, while you're at it, I understand that the removal of the old binary format should also pave the way for vector-valued intercepts - in such case, once those get added, would also be nice to arrange the intercepts for multinomial logistic objective in such a way that they would sum to zero, like GLMNET does. |
These are excellent suggestions.
We will remove the capability of saving it in this release, then the next release will remove the loading part. Can take some time.
To implement this, we will have to add a new parameter, probably called
This one is more difficult, currently, all parameters are passed through strings. Using JSON might help, but floating point serialization requires matching the encoder and decoder, otherwise there will be precision losses. I will take over the PR, please let me know if there are other things I can help with the R package. Hoping to include it in the next release (2.2). We still have plenty of time. |
Hmm, run into an issue with the weighted sample mean. If the weight is not strictly a proba simplex summing up to 1, we can generate an invalid mean value, like a mean value greater than 1 for logistic regression. |
I'm not sure what kind of impact it'd have, but there's always the option of doing a more precise sequential calculation like this: Or to use compensated sums. Although from a look at the code, could it perhaps be that some matrix is in the wrong memory layout? I see it uses:
But it doesn't validate that the matrix is column-major. |
It violates the range of logistic regression
I will add a restriction to it once the weight issue is resolved. The matrix view supports different memory layout type, we can dispatch. |
But that is not an issue with the computation: if the data contains only examples of one class, then the optimal solution for a regularized method like XGBoost's is to make the intercept infinite and not use any feature for generating scores. If a user wishes to use XGBoost as one-class classifier, the correct behavior from the user would be to manually set the intercept to zero. I guess the most logical course of action in such cases would be to throw an error explaining that the response is constant.
If the weights are non-negative and the floating point computations had infinite precision, then it shouldn't be possible to arrive at a value greater or smaller than the max/min of the inputs that go into that mean. But I can see it going wrong with fp32 when the number of rows is large, whether the weights sum to 1 or not. Some other ideas:
Are you able to provide an example where the result is outside the range |
Make sense.
Thank you for sharing, will look into them. Spent a bit of time today to prove the sample mean being the minimizer. Will continue to work on the PR next week. |
Added notes about why the reduction is written in this way. todo for myself:
|
d03db14
to
7b03f3b
Compare
@david-cortes @hcho3 The PR is completed now, please help review when you have a spare moment. |
LGTM 👍 |
ref #9899
This PR modifies the intercept initialization for simple objectives (logistic, poisson, gamma, tweedie) to use their closed-form optimal solutions (as in: the number that minimizes the objective function) instead of a non-optimal one-step Newton.
For these objectives, the optimal intercept corresponds simply to the link function applied to the mean of the response variable. Since
base_score
already undergoes this transformation, the PR here just changes calculation to the mean of the response variable in those cases.For multi-target versions of these objectives, it sets them to zero instead as otherwise applying a common intercept might not make much sense for the given problem.
Note that there's still room for improvements:
Note1: I wasn't sure about how to calculate a weighted sample mean here (not familiar with GPU computing and the 'devices' logic). Would be helpful to have a
WeightedMean
function understats
if possible, to use in case there's sample weights.Note2: The compiler checks here don't like turning a
linalg::Tensor<T, 2>
intolinalg::Tensor<T, 1>
byreinterpret_cast
. I'm also not sure what would be the right way to do it without a data copy.Note3: I wasn't sure where to add tests for the changes here. For example, would be ideal to test that
binary:logistic
andbinary:logitraw
produce the same raw scores, but I'm not sure where's the right place to add such test.