Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Hyperband/SuccesiveHalving does not work with ask/tell-interface #1148

Closed
becktepe opened this issue Oct 8, 2024 · 10 comments
Closed

Comments

@becktepe
Copy link

becktepe commented Oct 8, 2024

Description

I want to use the ask/tell-interface for multi-fidelity HPO. Particularly, I am using the hypersweeper package to run distributed HPO. However, I noticed that the number of configurations per budget did not match the Hyperband/successiveHalving schedule.

These are the Hyperband brackets:

--------------------------------------------------------------------------------
Bracket 0
  #Configs: [  64,   16,    4,    1]
  Budgets:  [  16,   62,  250, 1000]
--------------------------------------------------------------------------------
Bracket 1
  #Configs: [  22,    5,    1]
  Budgets:  [  62,  250, 1000]
--------------------------------------------------------------------------------
Bracket 2
  #Configs: [   8,    2]
  Budgets:  [ 250, 1000]
--------------------------------------------------------------------------------
Bracket 3
  #Configs: [   4]
  Budgets:  [1000]
--------------------------------------------------------------------------------

However, the sweeper executed 64 configurations on budget 15.625, followed by 22 configurations on budget 62.5 , instead of 16 configurations on budget 62.5 .

After some debugging, I assume the bug is caused by this snippet:

for config in configs:
isb_keys = rh.get_instance_seed_budget_keys(config)
if not all(isb_key in isb_keys for isb_key in from_keys):
raise NotEvaluatedError

Apparently, the seed field for the isb_keys are always None whereas the seed fields for from_keys are filled with random seeds. Therefore, the bracket is skipped and the configurations from the next brackets are used.

With this workaround, I achieve the expected behaviour:

for config in configs:
    isb_keys = rh.get_instance_seed_budget_keys(config)

    # Check if config seeds are None
    is_none = all([isb_key.seed is None for isb_key in isb_keys])

    if is_none:
        # No we have to set the seeds for from_keys to None as well
        from_keys = [
            InstanceSeedBudgetKey(
                instance=isb_key.instance, 
                seed=None, 
                budget=isb_key.budget) for isb_key in from_keys
            ]
    if not all(isb_key in isb_keys for isb_key in from_keys):
        raise NotEvaluatedError

Steps/Code to Reproduce

  1. Clone the hypersweeper repository
  2. In examples/configs/mlp_smac.yaml set
n_trials: 127    
min_budget: 15   
max_budget: 1000  
eta: 4           
  1. Run python examples/mlp.py --config-name=mlp_smac -m
  2. Check tmp/mlp_smac/runhistory.csv

Expected Results

64 configs on budget 15.625, 16 configs on budget 62.5, 4 configurations on budget 250, 1 configuration on budget 1000

Actual Results

64 configs on budget 15.625, 22 configs on budget 62.5, 8 configurations on budget 250, 4 configurations on budget 1000

This indicates that only the first stage of each bracket is evaluated.

Versions

Hypersweeper from https://github.com/becktepe/hypersweeper
SMAC==2.2.0

@TheEimer
Copy link

TheEimer commented Oct 8, 2024

I think this is probably related to #1131 - currently I'm testing what happens without the hypersweeper to. Performance seems pretty whack, so this could explain why.

@becktepe
Copy link
Author

becktepe commented Oct 8, 2024

Yes, it might be. For my experiments, I set job_array_size_limit=1 in the hypersweeper to avoid any interactions related to parallelization.

@TheEimer
Copy link

TheEimer commented Oct 8, 2024

Unfortunately this probably doesn't help. I just posted the updated results and even though anytime performance has a logging bug, it looks like SMAC is just doing random search here :/

@becktepe
Copy link
Author

becktepe commented Oct 9, 2024

Maybe this could also have been caused by passing seed=None to the tell() method where SMAC was expecting something else. If this is the case, it might not actually be a SMAC problem

@TheEimer
Copy link

TheEimer commented Oct 9, 2024

I don't think so, I ran the "vanilla" SMAC version with ask-tell without seed=None and curves were exactly the same. This is the code, produced the same curve as hypersweeper with max_parallel 0. Idk if there's something else that could be wrong here?

import hydra
from carps.utils.running import make_problem
from smac.facade.multi_fidelity_facade import MultiFidelityFacade
from smac.runhistory.dataclasses import TrialValue
from carps.utils.trials import TrialInfo
from smac import Scenario
from pathlib import Path
import json

@hydra.main(config_path=".", config_name="config_vanilla_smac.yaml")
def run_carps(cfg):
    problem = make_problem(cfg=cfg.problem)

    # Scenario object
    scenario = Scenario(problem.configspace, deterministic=False, n_trials=126, min_budget=1, max_budget=52, seed=cfg.seed)

    intensifier = MultiFidelityFacade.get_intensifier(
        scenario,
        eta=3
    )

    def dummy(config, seed, budget, **kwargs):
        return 0.0
    
    # Now we use SMAC to find the best hyperparameters
    smac = MultiFidelityFacade(
        scenario,
        dummy,
        intensifier=intensifier,
        overwrite=True,
    )

    incumbent_config = {}
    incumbent_score = 100000
    budget_used = 0

    # We can ask SMAC which trials should be evaluated next
    for _ in range(126):
        info = smac.ask()
        trial_info = TrialInfo(info.config, seed=cfg.seed, budget=info.budget)

        cost = problem.evaluate(trial_info)
        value = TrialValue(cost=cost.cost, time=0.5)
        budget_used += info.budget

        smac.tell(info, value)

        if -cost.cost > -incumbent_score:
            incumbent_score = cost.cost
            incumbent_config = info.config.get_dictionary()

        log_dict = {}
        log_dict["config"] = incumbent_config
        log_dict["score"] = incumbent_score
        log_dict["budget_used"] = budget_used
        with Path("incumbent.jsonl").open("a") as f:
            json.dump(log_dict, f)
            f.write("\n")

if __name__ == "__main__":
    run_carps()

@TheEimer
Copy link

TheEimer commented Oct 9, 2024

Actually my results probably have nothing to do with ask-tell after all - the same happens for TF execution mode. So either this issue is unrelated to #1131 or this bug also shows up for the standard TF execution.

@becktepe
Copy link
Author

becktepe commented Oct 9, 2024

Regarding #1131:
When using the hypersweeper v.0.2.0 and SMAC v.2.1.0 with the default max_parallelization=0.1, n_trials=301, min_budget=100_000, max_budget=10_000_000, and eta=2, we would expect these brackets:

--------------------------------------------------------------------------------
Stage 0
  #Configs:     [     64,     32,     16,       8,       4,       2,        1]
  Budgets:      [ 156250, 312500, 625000, 1250000, 2500000, 5000000, 10000000]
--------------------------------------------------------------------------------
Stage 1
  #Configs:     [     38,     19,       9,       4,       2,        1]
  Budgets:      [ 312500, 625000, 1250000, 2500000, 5000000, 10000000]
--------------------------------------------------------------------------------

But we got this, which is really strange:
(I put this into brackets and stages but this doesn't mean it's the brackets and stages SMAC used internally)

--------------------------------------------------------------------------------
Stage 0
  #Configs:     [     42,      42,      30,      21,        8]
  Budgets:      [ 625000, 1250000, 2500000, 5000000, 10000000]
--------------------------------------------------------------------------------
Stage 1
  #Configs:     [     64,     38,        8]
  Budgets:      [ 156250, 312500, 10000000]
--------------------------------------------------------------------------------
Stage 2
  #Configs:     [     32,     16]
  Budgets:      [ 312500, 625000]
--------------------------------------------------------------------------------

Not sure if this is related though

@TheEimer
Copy link

TheEimer commented Oct 9, 2024

That looks like a hypersweeper problem, seems like the scenario(?) isn't instantiated correctly? I took a quick look into my hypersweeper vs non-hypersweeper ask-tell brackets locally and those look to be the same with the SMAC default settings (as does the version with TF execution). So I think that's a matter of correctly creating the SMAC components. Could you make an issue for that over there?

I'll run a few more things soon anyway and try to look into checking what happens with changing eta, but I'd be surprised if this is a SMAC bug.

@becktepe
Copy link
Author

becktepe commented Oct 9, 2024

Okay, apparently the issue was caused by passing seed=None to the tell() method, which caused SMAC not to be able to match the configurations with the ones provided through ask(). So it was a bug from my side :D
This is still somewhat confusing to me since I set deterministic=True so I was not expecting SMAC to use seeds at all. It would be great to have some sort of check for this since it's very hard to trace back the issues as it can lead to a different hyperband behaviour without letting the user know

@benjamc
Copy link
Contributor

benjamc commented Oct 9, 2024

Glad that it works! I have created a new issue to improve documentation and add a check. Will close this. Feel free to open again if issues arise.

@benjamc benjamc closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants