Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vereinheitlichung von exakter und approximativer Suche nach Refinements #770

Closed
michael-rapp opened this issue Aug 17, 2023 · 1 comment · Fixed by #809
Closed

Vereinheitlichung von exakter und approximativer Suche nach Refinements #770

michael-rapp opened this issue Aug 17, 2023 · 1 comment · Fixed by #809
Assignees
Labels
boosting Affects the subproject "boosting" refactoring Reorganization or cosmetic changes of code requires profiling Potentially improves runtime efficiency, but profiling is required seco Affects the subproject "seco"
Milestone

Comments

@michael-rapp
Copy link
Collaborator

Sobald #742 umgesetzt wurde, sollte versucht werden, für die approximative Suche nach Refinements die selben Mechanismen und Datenstrukturen zu verwenden wie für die exakte Suche. Dies erfordert dass Implementierungen von IFeatureBinning eine Unterklasse von OrdinalFeatureVector zurückliefern, bei der die Featurewerte als Mittelwert zweier benachbarter Bins berechnet werden. Eine Umstellung sollte unter anderem die folgenden Klassen obsolet machen:

  • ApproximateThresholds
  • ApproximateRuleRefinement
  • NominalFeatureBinning
  • CoverageSet
  • IHistogram
@michael-rapp michael-rapp added boosting Affects the subproject "boosting" seco Affects the subproject "seco" refactoring Reorganization or cosmetic changes of code requires profiling Potentially improves runtime efficiency, but profiling is required labels Aug 17, 2023
@michael-rapp michael-rapp added this to the 0.10.0 milestone Aug 17, 2023
@michael-rapp
Copy link
Collaborator Author

michael-rapp commented Mar 1, 2024

Benchmarks

Comparisons of runtimes of the implementation in this branch (after) and the main branch on the dataset "mediamill". In both cases, feature binning was used. Otherwise, the default parameters and a 10 fold-cross validation were utilized.

Feature binning method runtime before (in sec) runtime after (in sec)
Equal-frequency (8 bins) 16 minutes, 48 seconds and 992 milliseconds 13 minutes, 47 seconds and 56 milliseconds
Equal-width (8 bins) 18 minutes, 10 seconds and 332 milliseconds 14 minutes, 11 seconds and 733 milliseconds

According to these results, the new implementation comes with minor improvements in training times, regardless of the binning method used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
boosting Affects the subproject "boosting" refactoring Reorganization or cosmetic changes of code requires profiling Potentially improves runtime efficiency, but profiling is required seco Affects the subproject "seco"
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants