-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Botorch with cardinality constraint via sampling #301
base: main
Are you sure you want to change the base?
Botorch with cardinality constraint via sampling #301
Conversation
0be3285
to
1f7783d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Waschenbacher, thanks for the work. This is not yet a review but only a first batch of very high-level comments that I would like you to address before we can go into the actual review process. The reason being that the main functionality brought by this PR is currently rather convoluted and hard to parse, so I'd prefer to to work with a more readable version, tbh
1f7783d
to
9d28b49
Compare
f68ce4f
to
fc52875
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Waschenbacher, I've finally managed to spend some time on this important PR, thanks again for your preparation work. I've already refactored some parts that were clear to me. However, I'm not yet certain about the design at the remaining places. I have the feeling that it can potentially be simplified a lot, depending on whether it is possible to reuse the idea of reduced subspaces. Have marked the corresponding places with comments. I think we need to discuss this part first before we can continue.
8a7ef15
to
046a8e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @AVHopp @Scienfitz. I spent some time an refactored @Waschenbacher's proposal quite a bit, and think that the logic shipped by this PR is now very clear. So from my end, everything seems in good shape and we are ready for a fresh round of review. But I feel we are close to getting it merged. There a few open things to be discussed, I've marked the corresponding parts and also updated the PR description. Let me know what you think
@@ -150,5 +150,38 @@ def summary(self) -> dict: | |||
return param_dict | |||
|
|||
|
|||
@define(frozen=True, slots=False) | |||
class _FixedNumericalContinuousParameter(ContinuousParameter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AVHopp @Scienfitz: The class proved to be useful for this PR, even though we could have implemented the mechanism in a different way where instead of fixing parameter values we could have created reduced spaces where the parameter values are dropped. However, since botorch's optimizers accept a list of fixed values, this was the simpler implementation.
I've kept it private for now because I first want to see if/how things might need to be adjusted. For example, just yesterday, I had a conversation with @julianStreibel about another feature that would be very useful in the future, namely conditioning recommendations onto certain subspaces without having to redefine the searchspace object. I guess we'll use it for that as well, but until then I'll keep it hidden.
Any comments, thoughts? If not, just give thumbs up and I will resolve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iirc, we agreed that we can keep this class in the BaybAThon, right? So can this comment be resolved?
@@ -87,3 +89,51 @@ def get_parameters_from_dataframe( | |||
def sort_parameters(parameters: Collection[Parameter]) -> tuple[Parameter, ...]: | |||
"""Sort parameters alphabetically by their names.""" | |||
return tuple(sorted(parameters, key=lambda p: p.name)) | |||
|
|||
|
|||
def activate_parameter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Scienfitz @AVHopp: This is one of the important open points to discuss. As you probably know by now, the mechanism of this PR is to divide parameters in to active and inactive sets. However, this opens a general question, not only related to this PR:
- What do we even mean with an "active continuous parameter", i.e. what is the value range where such a parameter is considered "active"?
The current logic is to assume some small threshold around 0 to discriminate active/inactive ranges. And this function then takes care of "activating" a parameter according to this definition.
Problem to be discussed:
For parameters with bound falling into the threshold range, everything is simple. But how can we guarantee that a parameter with bounds (-a, b)
with a,b >> threshold
is active? We cannot simply "cut out" the threshold interval since we then end up with a non-continuous range (see comment in docstring). This causes downstream problems in the sense that we cannot strictly guarantee that the "min_cardinality" part of a cardinality constrained will be fulfilled, since any optimizer might decide that – even if we assign the parameter to the "active set" – its value should be driven to zero. And then the cardinality will drop by one. In theory, one could solve this by branching into the two separate subspaces of the non-continuous range of the parameter, but that will end up with yet another exponential blow-up.
TL;DR: I think min-cardinality constraints are really hard to strictly fulfill. On the other hand, I cannot think of any use case where people actually need a min-constraint. So I see two options:
- (not preferred) dropping the option of min values in cardinality constraints.
- (my suggestion) keeping everything as is and being aware (plus potentially mention it in the docstring) that the min-part of the constraint can be overridden by the optimizer. The min part still reduces the overall search burden in that it rules out certain subspaces to start with. But for the ones left, the min-part could be violated.
Input is appreciated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick follow-up: whatever we decide here, the corresponding logic of the random recommender should also be updated, which does not yet use any concept of activated parameters. It's not as important for the random recommender as it is for optimization based ones, but still ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First question: Wasn't this sort of a problem one of the main arguments against using Narrow Gaussian Processes? By my understanding, the problem that we are getting here is now basically identical, could you please give a quick heads-up why Narrow Gaussians are not an option/alternative? Don't need an extensive discussion, but rather a "Hey Alex, we already talked about that, let me quickly refresh your memory".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now to your actual points.
- First problem that I see with allowing
min_cardinality
and just mentioning in the docstring that this might be overwritten by the optimizer seems VERY dangerous to me. As we know by now, people tend to not read documentation at all. So what I see is people thinking "Oh, great, I can use amin_cardinality
", stop to read and think at exactly that point, assume that it works as intended and then be angry if it does not work. - This is a potential crucial silent error that people might have at very inconvenient times: Imagine that you really require this lower cardinality, and that it works the first X runs of your optimization, just to fail in run X+1, invalidating potentially your whole set up.
- As you say yourself: "I cannot think of any use case where people actually need a min-constraint". We said that things like this should be a strong indicator to not implement stuff.
So I'd propose to drop the support for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To first question: let's tackle this topic in a dedicated baybathon
Second: today in dev meeting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As agreed in the baybathon, there are two options how to support the cardinality constraint (especially the minimum cardinality) in Botorch recommender:
- Option I: this PR with hard threshold. If possible, the bounds of an active parameter gets updated by excluding the inactive range (-threshold, threshold). But if impossible, e.g. a parameter has bounds (-a, b) with a,b >> threshold (see discussion above), its value may be driven to zero by the optimizer and the
min_cardinality
might get violated. A warningMinimumCardinalityViolatedWarning
will be raised if the minimum cardinality is violated. - Option II: a future PR based on narrow Gaussian kernel = soft-constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we agreed on option I, can we resolve this comment here? Even if there are open points to discuss, I personally would prefer to just open a new thread in that case as the comments start to be a bit hard to navigate to be honest.
@@ -76,6 +85,13 @@ class BotorchRecommender(BayesianRecommender): | |||
optimization. **Does not affect purely discrete optimization**. | |||
""" | |||
|
|||
max_n_subspaces: int = field(default=10, validator=[instance_of(int), ge(1)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AVHopp @Scienfitz: To you think this is an appropriate attribute name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it should be linked to cardinality
somehow: For people that are not interested in cardinality constraints, it is only clear after reading the docstring that this is not interesting for them. This would however make the name quite long :/ So although I am not perfectly happy with the name, we could keep it as I do not see a better alternative while keeping it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your point but I see this emerging as part of a bigger problem, actually, not limited to this particular attribute. Namely that all attributes of the class are linked to particular settings only:
sequential_continuous
is relevant for continuous spaces onlyhybrid_samples
is relevant for hybrid spaces onlysampling_percentage
is relevant for hybrid spaces and specific sampling strategies only
In that sense, max_n_subspaces
is even a little less problematic since it is strictly speaking not only relevant for cardinality constraints but could also be used for hybrid spaces. And now I also see another clash, since it's functionality basically overlaps with sampling_percentage
🤔
So two TODOs from I see:
- Cleverly resolve the duplication somehow
- Potentially, in a next PR: find a better structure. We could consider, for example, to capture all the settings in a separate object like a TypedDict or so, that then uses its own validation path
@Scienfitz: Thoughts from your end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the duplicates, we could rename sampling_percentage
to hybrid_sampling_percentage
or something similar, that might already be sufficient. Agree that it would be beneficial to somehow "merge" these two attributes soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then I think for now we should keep the duplication to not blow up this PR and then replace sampling_percentage
completely and just use this attribute instead.
), | ||
] | ||
return to_string(self.__class__.__name__, *fields) | ||
def _optimize_continuous_subspaces( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AVHopp @Scienfitz: Just wanted to raise awareness. I've refactored @Waschenbacher's original version quite significantly, but the nice outcome of this is that we now have this useful and general helper that can be used to optimize over arbitrary non-connected subspaces. Thumbs up once you saw it and I'll resolve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we potentially leverage this for Discrete Searchspaces or some sort of hybrid recommendation as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the use is by no means limited to cardinality constraints. It's basically the baybe equivalent to botorchs optimize_acqf_mixed
, if you will.
@@ -301,6 +334,45 @@ def _drop_parameters(self, parameter_names: Collection[str]) -> SubspaceContinuo | |||
], | |||
) | |||
|
|||
def _enforce_cardinality_constraints_via_assignment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not yet perfectly happy with the name here. Any better suggestions? One alternative I could imagine would be _enforce_cardinality_constraints_via_bounds
. But input is welcome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is only one way of enforcing cardinality consraints, why not simply enforce_cardinality_constraints
? If people are interested in the details of the how, they can read the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do, but would keep it private 👍🏼. @Scienfitz: thumbs up is sufficient for approval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First high-level review regarding points that we should discuss. Not a full review yet as I think that the code might change depending on what is decided regarding the min_cardinality
.
@@ -87,3 +89,51 @@ def get_parameters_from_dataframe( | |||
def sort_parameters(parameters: Collection[Parameter]) -> tuple[Parameter, ...]: | |||
"""Sort parameters alphabetically by their names.""" | |||
return tuple(sorted(parameters, key=lambda p: p.name)) | |||
|
|||
|
|||
def activate_parameter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First question: Wasn't this sort of a problem one of the main arguments against using Narrow Gaussian Processes? By my understanding, the problem that we are getting here is now basically identical, could you please give a quick heads-up why Narrow Gaussians are not an option/alternative? Don't need an extensive discussion, but rather a "Hey Alex, we already talked about that, let me quickly refresh your memory".
@@ -87,3 +89,51 @@ def get_parameters_from_dataframe( | |||
def sort_parameters(parameters: Collection[Parameter]) -> tuple[Parameter, ...]: | |||
"""Sort parameters alphabetically by their names.""" | |||
return tuple(sorted(parameters, key=lambda p: p.name)) | |||
|
|||
|
|||
def activate_parameter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now to your actual points.
- First problem that I see with allowing
min_cardinality
and just mentioning in the docstring that this might be overwritten by the optimizer seems VERY dangerous to me. As we know by now, people tend to not read documentation at all. So what I see is people thinking "Oh, great, I can use amin_cardinality
", stop to read and think at exactly that point, assume that it works as intended and then be angry if it does not work. - This is a potential crucial silent error that people might have at very inconvenient times: Imagine that you really require this lower cardinality, and that it works the first X runs of your optimization, just to fail in run X+1, invalidating potentially your whole set up.
- As you say yourself: "I cannot think of any use case where people actually need a min-constraint". We said that things like this should be a strong indicator to not implement stuff.
So I'd propose to drop the support for this.
@@ -76,6 +85,13 @@ class BotorchRecommender(BayesianRecommender): | |||
optimization. **Does not affect purely discrete optimization**. | |||
""" | |||
|
|||
max_n_subspaces: int = field(default=10, validator=[instance_of(int), ge(1)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it should be linked to cardinality
somehow: For people that are not interested in cardinality constraints, it is only clear after reading the docstring that this is not interesting for them. This would however make the name quite long :/ So although I am not perfectly happy with the name, we could keep it as I do not see a better alternative while keeping it here.
subspace_continuous: SubspaceContinuous, | ||
batch_size: int, | ||
) -> tuple[Tensor, Tensor]: | ||
"""Recommend from a continuous search space with cardinality constraints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, only some information about how max_n_subspaces
comes into play here is still missing for me :)
), | ||
] | ||
return to_string(self.__class__.__name__, *fields) | ||
def _optimize_continuous_subspaces( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we potentially leverage this for Discrete Searchspaces or some sort of hybrid recommendation as well?
@@ -301,6 +334,45 @@ def _drop_parameters(self, parameter_names: Collection[str]) -> SubspaceContinuo | |||
], | |||
) | |||
|
|||
def _enforce_cardinality_constraints_via_assignment( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is only one way of enforcing cardinality consraints, why not simply enforce_cardinality_constraints
? If people are interested in the details of the how, they can read the docstring.
- Replace redandunt function - Ensure near-zero range being an open interval
- Add to-do related to customized error in botorch - Add to-do related to active parameters guarantee in random sampler
@AdrianSosic This is the updated PR according to our discussion in the baybathon. The main changes are below:
|
@@ -87,3 +89,51 @@ def get_parameters_from_dataframe( | |||
def sort_parameters(parameters: Collection[Parameter]) -> tuple[Parameter, ...]: | |||
"""Sort parameters alphabetically by their names.""" | |||
return tuple(sorted(parameters, key=lambda p: p.name)) | |||
|
|||
|
|||
def activate_parameter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt htis be removed given the meeting outcome that these cases of non-guaranteed min_cardinality are acceptable with a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we still need activate_parameters
.
- When the original bounds of an active parameter is
[0, a]
or[-a, 0]
. witha>threshold
, we still need this method to shift the bounds so that it is guaranteed active. I would say especially in this case, the acquisition function tends to have the optimum on the boundary, which is "0". Therefore,activate_parameter
helps at least for this case. - When the original bounds of an active parameter is
[a, b]
with valuea<0<b
and the range is larger than the threshold, we cannot really activate the parameter. Only in this case, it may occur that the optimizer drives this parameter to0
. If it occurs, the minimum cardinality may not hold. This is the reason why we have the check on minimum cardinality and raise a warning if it gets violated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second that
@@ -41,6 +49,9 @@ def validate_constraints( # noqa: DOC101, DOC103 | |||
param_names_discrete = [p.name for p in parameters if p.is_discrete] | |||
param_names_continuous = [p.name for p in parameters if p.is_continuous] | |||
param_names_non_numerical = [p.name for p in parameters if not p.is_numerical] | |||
params_continuous: list[NumericalContinuousParameter] = [ | |||
p for p in parameters if isinstance(p, NumericalContinuousParameter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sry for my confusion, the method exists in this PR https://github.com/emdgroup/baybe/pull/291/files#diff-9b02c8d8e9e86b086ea306806ebc5e47435ac5045557f8eb138a7be22c3cb0e8R461 which is however on hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an incomplete review as I was not aware that there is still stuff being worked on. Feel free to either already include my comments or just resolve them.
@@ -149,6 +152,54 @@ def summary(self) -> dict: | |||
) | |||
return param_dict | |||
|
|||
def is_near_zero(self, item: float) -> bool: | |||
"""Return whether an item is near-zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would add an "according to the threshold set" or similar here, just to make sure that there is no confusion.
|
||
Important: | ||
Value in the open interval (-near_zero_threshold, near_zero_threshold) | ||
will be treated as near_zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insonsistencies regarding the use of near_zero
and near-zero
in this docstring.
@@ -150,5 +150,38 @@ def summary(self) -> dict: | |||
return param_dict | |||
|
|||
|
|||
@define(frozen=True, slots=False) | |||
class _FixedNumericalContinuousParameter(ContinuousParameter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iirc, we agreed that we can keep this class in the BaybAThon, right? So can this comment be resolved?
@@ -87,3 +89,51 @@ def get_parameters_from_dataframe( | |||
def sort_parameters(parameters: Collection[Parameter]) -> tuple[Parameter, ...]: | |||
"""Sort parameters alphabetically by their names.""" | |||
return tuple(sorted(parameters, key=lambda p: p.name)) | |||
|
|||
|
|||
def activate_parameter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we agreed on option I, can we resolve this comment here? Even if there are open points to discuss, I personally would prefer to just open a new thread in that case as the comments start to be a bit hard to navigate to be honest.
) -> NumericalContinuousParameter: | ||
"""Activates a given parameter by moving its bounds away from zero. | ||
|
||
Important: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DId you have a look whether these Important:
blocks are properly rendered in the documentation? Just want to avoid that sphinx
thinks that this is something special and tries to interpret this as some sort of keyword.
cardinality is violated in `BotorchRecommender` | ||
- Attribute `max_n_subspaces` to `BotorchRecommender`, allowing to control | ||
optimization behavior in the presence of multiple subspaces | ||
- Utilities `inactive_parameter_combinations` and`n_inactive_parameter_combinations` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these as well as the utilities noted below are not user-facing, are they? In that case, I do not think that it is necessary to include them in the CHANGELOG
|
||
return pd.DataFrame(points, columns=subspace_continuous.parameter_names) | ||
|
||
def _recommend_continuous_torch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need an additional function for that? In my opinion, this check could be done in the original function, and this function just adds another layer of complexity. Also, I think the name is weird, why the explicit mention of torch
?
subspace_continuous: SubspaceContinuous, | ||
batch_size: int, | ||
) -> tuple[Tensor, Tensor]: | ||
"""Recommend from a continuous search space with cardinality constraints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still the case.
subspace_continuous, | ||
pd.DataFrame(points, columns=subspace_continuous.parameter_names), | ||
): | ||
warnings.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to give more information here? E.g. for how many points the constraints are violated or maybe even the exact points for which it is violated?
acq_function=self._botorch_acqf, | ||
bounds=torch.from_numpy(subspace_continuous.comp_rep_bounds.values), | ||
q=batch_size, | ||
num_restarts=self.n_restarts, | ||
raw_samples=self.n_raw_samples, | ||
# TODO: https://github.com/pytorch/botorch/issues/2042 | ||
fixed_features=fixed_parameters or None, | ||
# TODO: https://github.com/pytorch/botorch/issues/2042 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only need this line once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is this actually still an issue? I had a brief look at the code and it seems like the fix that Adrian originally suggested has been implemented, and the issue is also officially closed.
(edited by @AdrianSosic)
This PR adds support for cardinality constraints to
BotorchRecommender
. The core idea is to tackle the problem in an exhaustive search like manner, i.e. byThe PR implements two mechanisms for determining the configuration of inactive parameters:
The current aggregation step is to simply optimize all subspaces independently of each other and then return the batch from the subspace where the highest acquisition value is achieved. This has the side-effect that the set of inactive parameters is the same across the entire recommendation batch. This can be a desirable property in many use cases but potentially higher acquisition values can be obtained by altering the in-/activity sets across the batch. A simple way to achieve this (though out of scope for this PR) is by generalizing the sequential greedy principle to multiple subspaces.
Out of scope