Duplicated trials #207

qo4on · 2021-05-13T12:53:16Z

I run your HEBO custom example and see that it runs the same trials multiple times. Can I skip them and finish when there are no unique hp configurations left?
Setting cv=5 I expected to see 5 test scores for each trial, but it shows only 3 of them: split0_test_score, split1_test_score, split2_test_score. Can you clarify how it works?

seed = 0

clf = RandomForestClassifier(random_state=seed)

param_distributions = {
    "n_estimators": tune.randint(20, 21),
    "max_depth": tune.randint(2, 3),
}

tune_search = TuneSearchCV(
    clf,
    param_distributions,
    n_trials=5,
    search_optimization=HEBOSearch(),
    cv=5,
    random_state=seed,
    local_dir='ray',
    verbose=2,
)

tune_search.fit(x_train, y_train)

The text was updated successfully, but these errors were encountered:

Yard1 · 2021-05-13T13:16:10Z

How duplicate trials are handled depends on the search algorithm itself, and it looks like HEBO doesn't account for those.

qo4on · 2021-05-13T13:38:31Z

So, we can fix that by sending existing results to HEBO without additional training.
Where are the test score results for remaining split3_test_score and split4_test_score?

Yard1 · 2021-05-13T13:43:20Z

Isn't it just due to Pandas trying to fit the dataframe on screen?
pd.DataFrame(tune_search.cv_results_) should have all the info you need

qo4on · 2021-05-13T14:22:37Z

Thank you, it works.

Don't you think it would be good to skip all duplicates for all searchers by default? I think it's a big problem when you think you tune hp's but in fact you're training the same over and over again.

Yard1 · 2021-05-13T14:44:16Z

It's not a straightforward thing, as those should be ideally handled by the search algorithm itself. For example, if we were to reject duplicates from a search algorithm that doesn't check for them, it is possible for the situation to become an infinite loop where the tuner rejects the duplicate suggestion only for the algorithm to suggest it again, as this is what it considers to be the best configuration.

In any case, that should be done in Ray Tune itself, and not here. @krfricke what do you think?

richardliaw · 2021-05-14T00:11:22Z

Hmm, so yeah this seems to be a common request.

We've actually implemented something similar in Bayesopt (see ray/python/ray/tune/suggest/bayesopt.py).

qo4on · 2021-05-14T08:15:32Z

Besides HEBO counts duplicates as newly tested parameters and throws an error when it reaches all possible combinations count. This leads to the fact that not all parameters are tested.
huawei-noah/noah-research#28

qo4on · 2021-05-16T19:48:58Z

@richardliaw
Which search_optimization don't have such issue except Bayesopt?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated trials #207

Duplicated trials #207

qo4on commented May 13, 2021

Yard1 commented May 13, 2021

qo4on commented May 13, 2021

Yard1 commented May 13, 2021 •

edited

Loading

qo4on commented May 13, 2021

Yard1 commented May 13, 2021

richardliaw commented May 14, 2021

qo4on commented May 14, 2021

qo4on commented May 16, 2021

Duplicated trials #207

Duplicated trials #207

Comments

qo4on commented May 13, 2021

Yard1 commented May 13, 2021

qo4on commented May 13, 2021

Yard1 commented May 13, 2021 • edited Loading

qo4on commented May 13, 2021

Yard1 commented May 13, 2021

richardliaw commented May 14, 2021

qo4on commented May 14, 2021

qo4on commented May 16, 2021

Yard1 commented May 13, 2021 •

edited

Loading