Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid model parameters should either error or keep user arguments #162

Open
ExpandingMan opened this issue Sep 14, 2022 · 4 comments
Open

Comments

@ExpandingMan
Copy link

I noticed today that if a model constructor fails parameter validation it reverts to the default with a warning.

I find this behavior confusing. Normally invalid constructors lead to an error. Alternatively, I can imagine getting frustrated if an overzealous maintainer improperly restricted a parameter range, in which case the thing to do would be to allow the user-provided argument with an error. Either way, reverting to the default seems like a very strange behavior, thoughts?

@ablaom
Copy link
Member

ablaom commented Sep 16, 2022

This makes sense. As far as changing the behaviour of the @mlj_model macro, that will have to wait for the next breaking release, which won't be for a while.

@ExpandingMan
Copy link
Author

We hit another issue today which is related to this but not the same, see here and here.

I propose that always storing all parameters, while perhaps having noble intention, is likely to eternally be fraught with issues in practice. It just requires too much maintenance. Ideally, every model would ultimately provide access to its parameters, but sadly, this is not reality.

Note that, in addition to whatever issues that might have come up, there is probably a lot of extra effort spent in all the interfaces trying to keep things consistent that hasn't necessarily resulted in any issues being opened. Certainly I felt there were a lot of parameter acrobatics involved in MLJXGBoostInterface.jl as is evidenced in its current master.

It's not clear to me whether this issue has caused frustration for many others, but it's something I've been burned by a number of times as evidenced by this thread. I'm also not sure whether any decision should be made here without serious breakage, but it's probably worth thinking about.

@ablaom
Copy link
Member

ablaom commented Jan 15, 2023

Thanks @ExpandingMan for these further thoughts. Sorry to hear about the parameter woes. Gradient boosters have a lot of them, that's for sure.

In my experience maintaining the MLJ ecosystem, this maintenance concern is particular to XGBoost, because it has so many parameters and because it wraps an algorithm in another language. In EvoTrees.jl the core algorithm struct, and the MLJ struct are the same, so there's no problem there. Most non booster models rarely have more than half a dozen parameters - maybe 10 or 11 at tops. Generally, when we've had to change the parameters or documentation it's been because of a breaking change to the core package that we had to deal with anyway.

On balance, I think the advantages of storing all the parameters in the stuct outweigh the maintenance concerns, painful as they are in you own case.

@ExpandingMan
Copy link
Author

Fair enough. Seems like one of those things for which there is no good option, other than maybe raising awareness in packages that implement models to please be more transparent with model parameters as other interfaces probably want to know about them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: priority low / straightforward
Development

No branches or pull requests

2 participants