You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After attending GECCO 2024, watching the multi-criteria optimization benchmark tutorial from Juergen Branke, and discussing some ideas with Hengzhe and Fabricio, I wanted to share my thoughts and what I learned about benchmarking.
I wrote down a list of good principles, and after looking at it, I felt that we could have a sort of manifesto for SRbench from which we would be faithful.
Anyway, here are these ideas:
Benchmarks should serve not only to compare methods but also:
to tune the algorithms;
to measure progress in the field.
The theoretical development of new SR algorithms should focus on more than just improving performance on SRbench. Instead, we should change existing algorithms if they can improve results in a well-defined problem (and it will be up to the decision maker to say which algorithm is better anyway). Upcoming editions of SRbench should discuss new results with that in mind.
We should characterize subclasses of problems and identify which algorithms are better, mapping problems to algorithms.
Remove easy problems.
If we are tuning parameters, we should do that in a subset of problems but test on others. We should avoid overfitting the benchmark set.
The best algorithm depends on runtime. With new models using GPUs and large models, we should consider the extra steps they take before running the SR algorithm.
Each method tries to optimize a different error function (RMSE, NMSE, R2; some are multi-objective, some not). When comparing the models, we need to take that into account. Is it a fair comparison to look at metrics that some algorithms use directly at the objective? Alternatively, we could report a correlation between indicators as part of the result.
As Bill said, there is no perfect way of creating the benchmark; we should try to do our best, but we will always be wrong.
Problem sets should be related to the real world and fast to compute. Data should be open, and experiments reproducible.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
After attending GECCO 2024, watching the multi-criteria optimization benchmark tutorial from Juergen Branke, and discussing some ideas with Hengzhe and Fabricio, I wanted to share my thoughts and what I learned about benchmarking.
I wrote down a list of good principles, and after looking at it, I felt that we could have a sort of manifesto for SRbench from which we would be faithful.
Anyway, here are these ideas:
Beta Was this translation helpful? Give feedback.
All reactions