Dealing with duplicates and failed evaluations #1199

clementruhm · 2024-11-19T15:14:44Z

Hi!

For the default designers (GPUCBPEBandit and GPBandit) what is the recommended way to deal with duplicates? If the same set of parameters is suggested again? Should I:
a) store metrics from previous runs and complete trials using cached metric?
b) sample another trial to save time?

Is there a way to hop out cycle of the duplicated set of parameters being suggested? For example dynamically increase exploration?

Another question, for the trials that fail, how do I report back that the set of parameters is invalid? i do:

trial.complete(vz.Measurement(), infeasibility_reason="invalid_config")
designer.update(vza.CompletedTrials([trial]), vza.ActiveTrials())

Is it a valid approach? Is there a better one?

Kind regards

The text was updated successfully, but these errors were encountered:

xingyousong · 2024-11-19T15:21:47Z

On duplicates: Can I ask what search space you have? Duplicates occurring might make sense if the search space is a small categorical space (with finitely many possibilities), but it would be shocking if this occurred with a continuous / DOUBLE search space - i.e. floating point parameter values being exactly the same.

On failed trials: Yes, you would mark the trial as infeasible with infeasibility_reason. But from your code snippet, you seem to be using the designer in a custom loop (rather than using our client API) - can I ask the reason for this?

clementruhm · 2024-11-19T15:56:31Z

Can I ask what search space you have?

yes, its multiple categorical or discrete parameters. The number of combinations is rather large (up to 500k). nevertheless, it seems the it converges pretty fast and tends to start giving duplicates. Is there way to dynamically increase exploration?

you seem to be using the designer in a custom loop (rather than using our client API) - can I ask the reason for this?

I started with using client API, then switched to designer for no particular reason. it seems to be a bit faster. I do need a custom loop though, because measurement will be async: I create a trial and in some point in the future - result arrives. For now I am testing it in sync setup, so it should not really affect for this question

xingyousong · 2024-11-21T01:57:16Z

On exploration over categorical search spaces :

Thanks for raising this up, but it's still surprising, given that our paper found that we outperform all other industry baselines. If anything, it suggests that this duplication issue would happen even more in other algorithms.

The easiest for you as the user to try would be:

Increase the UCB coefficient to improve exploration
Disable the trust region, which normally enforces that suggestions should be close to previous trials.

Give these a shot, and please do let us know what happens, as this helps us improve the algorithm.

xingyousong · 2024-11-21T15:06:15Z

@clementruhm By the way, do you mind showing the exact search space code you're creating? We believe duplications generally shouldn't even happen for categoricals.

We're curious if we can get our hands dirty with what's happening (e.g. we re-create the search space with random objective values and run loops ourselves).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with duplicates and failed evaluations #1199

Dealing with duplicates and failed evaluations #1199

clementruhm commented Nov 19, 2024

xingyousong commented Nov 19, 2024 •

edited

Loading

clementruhm commented Nov 19, 2024 •

edited

Loading

xingyousong commented Nov 21, 2024

xingyousong commented Nov 21, 2024

Dealing with duplicates and failed evaluations #1199

Dealing with duplicates and failed evaluations #1199

Comments

clementruhm commented Nov 19, 2024

xingyousong commented Nov 19, 2024 • edited Loading

clementruhm commented Nov 19, 2024 • edited Loading

xingyousong commented Nov 21, 2024

xingyousong commented Nov 21, 2024

xingyousong commented Nov 19, 2024 •

edited

Loading

clementruhm commented Nov 19, 2024 •

edited

Loading