Skip to content

Version 0.10.0

Compare
Choose a tag to compare
@michael-rapp michael-rapp released this 05 May 19:16
· 869 commits to feature since this release
7b9fba9

A major update to the BOOMER algorithm that introduces the following changes.

This release comes with several API changes. For an updated overview of the available parameters and command line arguments, please refer to the documentation.

Algorithmic Enhancements

  • The project does now provide a Separate-and-Conquer (SeCo) algorithm based on traditional rule learning techniques that are particularly well-suited for learning interpretable models.
  • Space-efficient data structures are now used for storing feature values, depending on whether the feature is numerical, ordinal, nominal, or binary. This also enables to use optimized code paths for dealing with these different types of features.
  • The implementation of feature binning has been reworked in a way that avoids redundant code and results in a reduction of training times due to the use of the data structures mentioned above.
  • The value to be used for sparse elements of a feature matrix can now be specified via the C++ or Python API.
  • Nominal and ordinal feature values are now represented as integers to avoid issues due to limited floating point precision.
  • Safe comparisons of floating point values are now used to avoid issues due to limited floating point precision.
  • Fundamental data structures for vectors and matrices have been reworked to ease reusing existing functionality and avoiding redundant code.

Additions to the Command Line API

  • Information about the program can now be printed via the argument -v or --version.
  • Data characteristics do now include the number of ordinal attributes when printed on the console or written to a file via the command line argument --print-data-characteristics or --store-data-characteristics.

Bugfixes

  • An issue has been fixed that caused the number of numerical and nominal features to be swapped when using the command line arguments --print-data-characteristics or --store-data-characteristics.
  • The correct directory is now used for loading and saving parameter settings when using the command line arguments --parameter-dir and --store-parameters.

API Changes

  • The option num_threads of the parameters --parallel-rule-refinement, --parallel-statistic-update and --parallel-prediction has been renamed to num_preferred_threads.

Quality-of-Life Improvements

  • The documentation has been updated to a more modern theme supporting light and dark theme variants.
  • A build option that allows disabling multi-threading support via OpenMP at compile-time has been added.
  • The groundwork for GPU support was laid. It can be disabled at compile-time via a build option.
  • Added support for unit testing the project's C++ code. Compilation of the tests can be disabled via a build option.
  • The Python code is now checked for common issues by applying pylint via continuous integration.
  • The Makefile has been replaced with wrapper scripts triggering a SCons build.
  • Development versions of wheel packages are now regularly built via continuous integration, uploaded as artifacts, and published on Test-PyPI.
  • Continuous integration is now used to maintain separate branches for major, feature, and bugfix releases and keep them up-to-date.
  • The runtime of continuous integration jobs has been optimized by running individual steps only if necessary, caching files across subsequent runs, and making use of parallelization.
  • When tests are run via continuous integration, a summary of the test results is now added to merge requests and Github workflows.
  • Markdown files are now used for writing the documentation.
  • A consistent style is now enforced for Markdown files by applying the tool mdformat via continuous integration.
  • C++ 17 or newer is now required for compiling the project.