diff --git a/CHANGELOG.md b/CHANGELOG.md
index 36f0211595..2e3e9fe5b9 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -12,28 +12,28 @@ A major update to the BOOMER algorithm that introduces the following changes.
 
 ### Additions to the Command Line API
 
-* **Information about the program can now be printed** via the argument `-v` or `--version`.
-* **Data characteristics do now include the number of ordinal attributes** when printed on the console or written to a file via the command line argument `--print-data-characteristics` or `--store-data-characteristics`.
+- **Information about the program can now be printed** via the argument `-v` or `--version`.
+- **Data characteristics do now include the number of ordinal attributes** when printed on the console or written to a file via the command line argument `--print-data-characteristics` or `--store-data-characteristics`.
 
 ### Bugfixes
 
-* An issue has been fixed that caused the number of numerical and ordinal features to be swapped when using the command line arguments `--print-data-characteristics` or `--store-data-characteristics`.
-* The correct directory is now used for loading and saving parameter settings when using the command line arguments `--parameter-dir` and `--store-parameters`.
+- An issue has been fixed that caused the number of numerical and ordinal features to be swapped when using the command line arguments `--print-data-characteristics` or `--store-data-characteristics`.
+- The correct directory is now used for loading and saving parameter settings when using the command line arguments `--parameter-dir` and `--store-parameters`.
 
 ### API Changes
 
-* The option `num_threads` of the parameters `--parallel-rule-refinement`, `--parallel-statistic-update` and `--parallel-prediction` has been renamed to `num_preferred_threads`.
+- The option `num_threads` of the parameters `--parallel-rule-refinement`, `--parallel-statistic-update` and `--parallel-prediction` has been renamed to `num_preferred_threads`.
 
 ### Quality-of-Life Improvements
 
-* The documentation has been updated to a more modern theme supporting light and dark theme variants.
-* A build option that allows to disable multi-threading support via OpenMP at compile-time has been added.
-* The groundwork for GPU support was layed. It can be disabled at compile-time via a build option.
-* Added support for unit testing the project's C++ code. Compilation of the tests can be disabled via a build option.
-* The Python code is now checked for common issues by applying `pylint` via continuous integration.
-* The Makefile has been replaced with wrapper scripts triggering a [SCons](https://scons.org/) build.
-* The runtime of continuous integration jobs has been optimized by running individual steps only if necessary and caching files across subsequent runs.
-* The fundemental data structures used to implement vectors and matrices have been reworked to ease reusing existing functionality and avoiding redundant code.
+- The documentation has been updated to a more modern theme supporting light and dark theme variants.
+- A build option that allows to disable multi-threading support via OpenMP at compile-time has been added.
+- The groundwork for GPU support was layed. It can be disabled at compile-time via a build option.
+- Added support for unit testing the project's C++ code. Compilation of the tests can be disabled via a build option.
+- The Python code is now checked for common issues by applying `pylint` via continuous integration.
+- The Makefile has been replaced with wrapper scripts triggering a [SCons](https://scons.org/) build.
+- The runtime of continuous integration jobs has been optimized by running individual steps only if necessary and caching files across subsequent runs.
+- The fundemental data structures used to implement vectors and matrices have been reworked to ease reusing existing functionality and avoiding redundant code.
 
 ## Version 0.9.0 (Jul. 2nd, 2023)
 
@@ -43,65 +43,65 @@ A major update to the BOOMER algorithm that introduces the following changes.
 
 ### Algorithmic Enhancements
 
-* **Sparse matrices can now be used to store gradients and Hessians** if supported by the loss function. The desired behavior can be specified via a new parameter `--statistic-format`.
-* **Rules with partial heads can now be learned** by setting the parameter `--head-type` to the value `partial-fixed`, if the number of predicted labels should be predefined, or `partial-dynamic`, if the subset of predicted labels should be determined dynamically.
-* **A beam search can now be used** for the induction of individual rules by setting the parameter `--rule-induction` to the value `top-down-beam-search`.
-* **Variants of the squared error loss and squared hinge loss**, which take all labels of an example into account at the same time, can now be used by setting the parameter `--loss` to the value `squared-error-example-wise` or `squared-hinge-example-wise`.
-* **Probability estimates can be obtained for each label independently or via marginalization** over the label vectors encountered in the training data by setting the new parameter `--probability-predictor` to the value `label-wise` or `marginalized`.
-* **Predictions that maximize the example-wise F1-measure can now be obtained** by setting the parameter `--classification-predictor` to the value `gfm`.
-* **Binary predictions can now be derived from probability estimates** by specifying the new option `based_on_probabilities`.
-* **Isotonic regression models can now be used** to calibrate marginal and joint probabilities predicted by a model via the new parameters `--marginal-probability-calibration` and `--joint-probability-calibration`.
-* **The rules in a previously learned model can now be post-optimized** by reconstructing each one of them in the context of the other rules via the new parameter `--sequential-post-optimization`.
-* **Early stopping or post-pruning can now be used** by setting the new parameter `--global-pruning` to the value `pre-pruning` or `post-pruning`.
-* **Single labels can now be sampled in a round-robin fashion** by setting the parameter `--feature-sampling` to the new value `round-robin`.
-* **A fixed number of trailing features can now be retained** when the parameter `--feature-sampling` is set to the value `without-replacement` by specifying the option `num_retained`.
+- **Sparse matrices can now be used to store gradients and Hessians** if supported by the loss function. The desired behavior can be specified via a new parameter `--statistic-format`.
+- **Rules with partial heads can now be learned** by setting the parameter `--head-type` to the value `partial-fixed`, if the number of predicted labels should be predefined, or `partial-dynamic`, if the subset of predicted labels should be determined dynamically.
+- **A beam search can now be used** for the induction of individual rules by setting the parameter `--rule-induction` to the value `top-down-beam-search`.
+- **Variants of the squared error loss and squared hinge loss**, which take all labels of an example into account at the same time, can now be used by setting the parameter `--loss` to the value `squared-error-example-wise` or `squared-hinge-example-wise`.
+- **Probability estimates can be obtained for each label independently or via marginalization** over the label vectors encountered in the training data by setting the new parameter `--probability-predictor` to the value `label-wise` or `marginalized`.
+- **Predictions that maximize the example-wise F1-measure can now be obtained** by setting the parameter `--classification-predictor` to the value `gfm`.
+- **Binary predictions can now be derived from probability estimates** by specifying the new option `based_on_probabilities`.
+- **Isotonic regression models can now be used** to calibrate marginal and joint probabilities predicted by a model via the new parameters `--marginal-probability-calibration` and `--joint-probability-calibration`.
+- **The rules in a previously learned model can now be post-optimized** by reconstructing each one of them in the context of the other rules via the new parameter `--sequential-post-optimization`.
+- **Early stopping or post-pruning can now be used** by setting the new parameter `--global-pruning` to the value `pre-pruning` or `post-pruning`.
+- **Single labels can now be sampled in a round-robin fashion** by setting the parameter `--feature-sampling` to the new value `round-robin`.
+- **A fixed number of trailing features can now be retained** when the parameter `--feature-sampling` is set to the value `without-replacement` by specifying the option `num_retained`.
 
 ### Additions to the Command Line API
 
-* **Data sets in the MEKA format are now supported.**
-* **Certain characteristics of binary predictions can be printed or written to output files** via the new arguments `--print-prediction-characteristics` and `--store-prediction-characteristics`.
-* **Unique label vectors contained in the training data can be printed or written to output files** via the new arguments `--print-label-vectors` and `--store-label-vectors`.
-* **Models for the calibration of marginal or joint probabilities can be printed or written to output files** via the new arguments `--print-marginal-probability-calibration-model`, `--store-marginal-probability-calibration-model`, `--print-joint-probability-calibration-model` and `--store-joint-probability-calibration-model`.
-* **Models can now be evaluated repeatedly, using a subset of their rules with increasing size,** by specifying the argument `--incremental-prediction`.
-* **More control of how data is split into training and test sets** is now provided by the argument `--data-split` that replaces the arguments `--folds` and `--current-fold`.
-* **Binary labels, regression scores, or probabilities can now be predicted,** depending on the value of the new argument `--prediction-type`, which can be set to the values `binary`, `scores`, or `probabilities`.
-* **Individual evaluation measures can now be enabled or disabled** via additional options that have been added to the arguments `--print-evaluation` and `--store-evaluation`.
-* **The presentation of values printed on the console has vastly been improved.** In addition, options for controlling the presentation of values to be printed or written to output files have been added to various command line arguments.
+- **Data sets in the MEKA format are now supported.**
+- **Certain characteristics of binary predictions can be printed or written to output files** via the new arguments `--print-prediction-characteristics` and `--store-prediction-characteristics`.
+- **Unique label vectors contained in the training data can be printed or written to output files** via the new arguments `--print-label-vectors` and `--store-label-vectors`.
+- **Models for the calibration of marginal or joint probabilities can be printed or written to output files** via the new arguments `--print-marginal-probability-calibration-model`, `--store-marginal-probability-calibration-model`, `--print-joint-probability-calibration-model` and `--store-joint-probability-calibration-model`.
+- **Models can now be evaluated repeatedly, using a subset of their rules with increasing size,** by specifying the argument `--incremental-prediction`.
+- **More control of how data is split into training and test sets** is now provided by the argument `--data-split` that replaces the arguments `--folds` and `--current-fold`.
+- **Binary labels, regression scores, or probabilities can now be predicted,** depending on the value of the new argument `--prediction-type`, which can be set to the values `binary`, `scores`, or `probabilities`.
+- **Individual evaluation measures can now be enabled or disabled** via additional options that have been added to the arguments `--print-evaluation` and `--store-evaluation`.
+- **The presentation of values printed on the console has vastly been improved.** In addition, options for controlling the presentation of values to be printed or written to output files have been added to various command line arguments.
 
 ### Bugfixes
 
-* The behavior of the parameter `--label-format` has been fixed when set to the value `auto`.
-* The behavior of the parameters `--holdout` and `--instance-sampling` has been fixed when set to the value `stratified-label-wise`.
-* The behavior of the parameter `--binary-predictor` has been fixed when set to the value `example-wise` and using a model that has been loaded from disk.
-* Rules are now guaranteed to not cover more examples than specified via the option `min_coverage`. The option is now also taken into account when using feature binning. Alternatively, the minimum coverage of rules can now also be specified as a fraction via the option `min_support`. 
+- The behavior of the parameter `--label-format` has been fixed when set to the value `auto`.
+- The behavior of the parameters `--holdout` and `--instance-sampling` has been fixed when set to the value `stratified-label-wise`.
+- The behavior of the parameter `--binary-predictor` has been fixed when set to the value `example-wise` and using a model that has been loaded from disk.
+- Rules are now guaranteed to not cover more examples than specified via the option `min_coverage`. The option is now also taken into account when using feature binning. Alternatively, the minimum coverage of rules can now also be specified as a fraction via the option `min_support`.
 
 ### API Changes
 
-* The parameter `--early-stopping` has been replaced with a new parameter `--global-pruning`.
-* The parameter `--pruning` has been renamed to `--rule-pruning`.
-* The parameter `--classification-predictor` has been renamed to `--binary-predictor`.
-* The command line argument `--predict-probabilities` has been replaced with a new argument `--prediction-type`.
-* The command line argument `--predicted-label-format` has been renamed to `--prediction-format`.
+- The parameter `--early-stopping` has been replaced with a new parameter `--global-pruning`.
+- The parameter `--pruning` has been renamed to `--rule-pruning`.
+- The parameter `--classification-predictor` has been renamed to `--binary-predictor`.
+- The command line argument `--predict-probabilities` has been replaced with a new argument `--prediction-type`.
+- The command line argument `--predicted-label-format` has been renamed to `--prediction-format`.
 
 ### Quality-of-Life Improvements
 
-* Continuous integration is now used to test the most common functionalites of the BOOMER algorithm and the corresponding command line API.
-* Successful generation of the documentation is now tested via continuous integration.
-* Style definitions for Python and C++ code are now enforced by applying the tools `clang-format`, `yapf`, and `isort` via continuous integration.
+- Continuous integration is now used to test the most common functionalites of the BOOMER algorithm and the corresponding command line API.
+- Successful generation of the documentation is now tested via continuous integration.
+- Style definitions for Python and C++ code are now enforced by applying the tools `clang-format`, `yapf`, and `isort` via continuous integration.
 
 ## Version 0.8.2 (Apr. 11th, 2022)
 
 A bugfix release that solves the following issues:
 
-* Fixed prebuilt packages available at [PyPI](https://pypi.org/project/mlrl-boomer/).
-* Fixed output of nominal values when using the option `--print-rules true`.
+- Fixed prebuilt packages available at [PyPI](https://pypi.org/project/mlrl-boomer/).
+- Fixed output of nominal values when using the option `--print-rules true`.
 
 ## Version 0.8.1 (Mar. 4th, 2022)
 
 A bugfix release that solves the following issues:
 
-* Missing feature values are now dealt with correctly when using feature binning.
-* A rare issue that may cause segmentation faults when using instance sampling has been fixed.
+- Missing feature values are now dealt with correctly when using feature binning.
+- A rare issue that may cause segmentation faults when using instance sampling has been fixed.
 
 ## Version 0.8.0 (Jan. 31, 2022)
 
@@ -109,41 +109,41 @@ A major update to the BOOMER algorithm that introduces the following changes.
 
 ***This release comes with changes to the command line API. For an updated overview of the available parameters, please refer to the [documentation](https://mlrl-boomer.readthedocs.io/en/0.8.0/).***
 
-* The programmatic C++ API was redesigned for a more convenient configuration of algorithms. This does also drastically reduce the amount of wrapper code that is necessary to access the API from other programming languages and therefore facilitates the support of additional languages in the future.
-* An issue that may cause segmentation faults when using stratified sampling methods for the creation of holdout sets has been fixed.
-* Pre-built packages for Windows systems are now available at [PyPI](https://pypi.org/project/mlrl-boomer/).
-* Pre-built packages for Linux aarch64 systems are now provided.
+- The programmatic C++ API was redesigned for a more convenient configuration of algorithms. This does also drastically reduce the amount of wrapper code that is necessary to access the API from other programming languages and therefore facilitates the support of additional languages in the future.
+- An issue that may cause segmentation faults when using stratified sampling methods for the creation of holdout sets has been fixed.
+- Pre-built packages for Windows systems are now available at [PyPI](https://pypi.org/project/mlrl-boomer/).
+- Pre-built packages for Linux aarch64 systems are now provided.
 
 ## Version 0.7.1 (Dec. 15, 2021)
 
 A bugfix release that solves the following issues:
 
-* Fixes an issue preventing the use of dense representations of ground truth label matrices that was introduced in version 0.7.0.
-* Pre-built packages for MacOS systems are now available at [PyPI](https://pypi.org/project/mlrl-boomer/).
-* Linux and MacOS packages for Python 3.10 are now provided.
+- Fixes an issue preventing the use of dense representations of ground truth label matrices that was introduced in version 0.7.0.
+- Pre-built packages for MacOS systems are now available at [PyPI](https://pypi.org/project/mlrl-boomer/).
+- Linux and MacOS packages for Python 3.10 are now provided.
 
 ## Version 0.7.0 (Dec. 5, 2021)
 
 A major update to the BOOMER algorithm that introduces the following changes:
 
-* L1 regularization can now be used.
-* A more space-efficient data structure is now used for the sparse representation of binary predictions.
-* The Python API does now allow to access the rules in a model in a programmatic way.
-* It is now possible to output certain characteristics of training datasets and rule models.
-* Pre-built packages for the Linux platform are now available at [PyPI](https://pypi.org/project/mlrl-boomer/).
-* The [documentation](https://mlrl-boomer.readthedocs.io) has vastly been improved.
+- L1 regularization can now be used.
+- A more space-efficient data structure is now used for the sparse representation of binary predictions.
+- The Python API does now allow to access the rules in a model in a programmatic way.
+- It is now possible to output certain characteristics of training datasets and rule models.
+- Pre-built packages for the Linux platform are now available at [PyPI](https://pypi.org/project/mlrl-boomer/).
+- The [documentation](https://mlrl-boomer.readthedocs.io) has vastly been improved.
 
 ## Version 0.6.2 (Oct. 4, 2021)
 
 A bugfix release that solves the following issues:
 
-* Fixes a segmentation fault when a sparse feature matrix should be used for prediction that was introduced in version 0.6.0.
+- Fixes a segmentation fault when a sparse feature matrix should be used for prediction that was introduced in version 0.6.0.
 
 ## Version 0.6.1 (Sep. 30, 2021)
 
 A bugfix release that solves the following issues:
 
-* Fixes a mathematical problem when calculating the quality of potential single-label rules that was introduced in version 0.6.0.
+- Fixes a mathematical problem when calculating the quality of potential single-label rules that was introduced in version 0.6.0.
 
 ## Version 0.6.0 (Sep. 6, 2021)
 
@@ -151,54 +151,54 @@ A major update to the BOOMER algorithm that introduces the following changes.
 
 ***This release comes with changes to the command line API. For brevity and consistency, some parameters and/or their values have been renamed. Moreover, some parameters have been updated to use more reasonable default values. For an updated overview of the available parameters, please refer to the [documentation](https://mlrl-boomer.readthedocs.io/en/0.6.0/).***
 
-* The parameter `--instance-sampling` does now allow to use stratified sampling (`stratified-label-wise` and `stratified-example-wise`).
-* The parameter `--holdout` does now allow to use stratified sampling (`stratified-label-wise` and `stratified-example-wise`).
-* The parameter `--recalculate-predictions` does now allow to specify whether the predictions of rules should be recalculated on the entire training data, if instance sampling is used.
-* An additional parameter (`--prediction-format`) that allows to specify whether predictions should be stored using dense or sparse matrices has been added. 
-* The code for the construction of rule heads has been reworked, resulting in minor performance improvements.
-* The unnecessary calculation of Hessians is now avoided when used single-label rules for the minimization of a non-decomposable loss function, resulting in a significant performance improvement.
-* A programmatic C++ API for configuring algorithms, including the validation of parameters, is now provided.
-* A documentation is now available [online](https://mlrl-boomer.readthedocs.io).
+- The parameter `--instance-sampling` does now allow to use stratified sampling (`stratified-label-wise` and `stratified-example-wise`).
+- The parameter `--holdout` does now allow to use stratified sampling (`stratified-label-wise` and `stratified-example-wise`).
+- The parameter `--recalculate-predictions` does now allow to specify whether the predictions of rules should be recalculated on the entire training data, if instance sampling is used.
+- An additional parameter (`--prediction-format`) that allows to specify whether predictions should be stored using dense or sparse matrices has been added.
+- The code for the construction of rule heads has been reworked, resulting in minor performance improvements.
+- The unnecessary calculation of Hessians is now avoided when used single-label rules for the minimization of a non-decomposable loss function, resulting in a significant performance improvement.
+- A programmatic C++ API for configuring algorithms, including the validation of parameters, is now provided.
+- A documentation is now available [online](https://mlrl-boomer.readthedocs.io).
 
 ## Version 0.5.0 (Jun. 27, 2021)
 
 A major update to the BOOMER algorithm that introduces the following changes:
 
-* Gradient-based label binning (GBLB) can be used to assign labels to a predefined number of bins.
+- Gradient-based label binning (GBLB) can be used to assign labels to a predefined number of bins.
 
 ## Version 0.4.0 (Mar. 31, 2021)
 
 A major update to the BOOMER algorithm that introduces the following changes:
 
-* Large parts of the code have been refactored, and the core algorithm has been migrated to C++ entirely. It is now built and compiled using Meson and Ninja, which results in drastically reduced compile times.
-* The (label- and example-wise) logistic loss functions have been rewritten to better prevent numerical problems.
-* Approximate methods for evaluating potential conditions of rules, based on unsupervised binning methods (currently equal-width- and equal-frequency-binning), have been added.
-* The parameter `--predictor` does now allow using different algorithms for prediction (`label-wise` or `example-wise`).
-* An early stopping mechanism has been added, which allows to stop the induction of rules as soon as the quality of the model does not improve on a holdout set.    
-* Multi-threading can be used to parallelize the prediction for different examples across multiple CPU cores.
-* Multi-threading can be used to parallelize the calculation of gradients and Hessians for different examples across multiple CPU cores.
-* Probability estimates can be predicted when using the loss function `label-wise-logistic-loss`.
-* The algorithm does now support data sets with missing feature values.
-* The loss function `label-wise-squared-hinge-loss` has been added. 
-* Experiments using single-label data sets are now supported out of the box.
+- Large parts of the code have been refactored, and the core algorithm has been migrated to C++ entirely. It is now built and compiled using Meson and Ninja, which results in drastically reduced compile times.
+- The (label- and example-wise) logistic loss functions have been rewritten to better prevent numerical problems.
+- Approximate methods for evaluating potential conditions of rules, based on unsupervised binning methods (currently equal-width- and equal-frequency-binning), have been added.
+- The parameter `--predictor` does now allow using different algorithms for prediction (`label-wise` or `example-wise`).
+- An early stopping mechanism has been added, which allows to stop the induction of rules as soon as the quality of the model does not improve on a holdout set.
+- Multi-threading can be used to parallelize the prediction for different examples across multiple CPU cores.
+- Multi-threading can be used to parallelize the calculation of gradients and Hessians for different examples across multiple CPU cores.
+- Probability estimates can be predicted when using the loss function `label-wise-logistic-loss`.
+- The algorithm does now support data sets with missing feature values.
+- The loss function `label-wise-squared-hinge-loss` has been added.
+- Experiments using single-label data sets are now supported out of the box.
 
 ## Version 0.3.0 (Sep. 14, 2020)
 
 A major update to the BOOMER algorithm that features the following changes:
 
-* Large parts of the code (loss functions, calculation of gradients/Hessians, calculation of predictions/quality scores) have been refactored and rewritten in C++. This comes with a constant speed-up of training times.
-* Multi-threading can be used to parallelize the evaluation of a rule's potential refinements across multiple CPU cores.
-* Sparse ground truth label matrices can now be used for training, which may reduce the memory footprint in case of large data sets.
-* Additional parameters (`feature-format` and `label-format`) that allow to specify the preferred format of the feature and label matrix have been added.
+- Large parts of the code (loss functions, calculation of gradients/Hessians, calculation of predictions/quality scores) have been refactored and rewritten in C++. This comes with a constant speed-up of training times.
+- Multi-threading can be used to parallelize the evaluation of a rule's potential refinements across multiple CPU cores.
+- Sparse ground truth label matrices can now be used for training, which may reduce the memory footprint in case of large data sets.
+- Additional parameters (`feature-format` and `label-format`) that allow to specify the preferred format of the feature and label matrix have been added.
 
 ## Version 0.2.0 (Jun. 28, 2020)
 
 A major update to the BOOMER algorithm that features the following changes:
 
-* Includes many refactorings and quality of live improvements. Code that is not directly related with the algorithm, such as the implementation of baselines, has been removed.
-* The algorithm is now able to natively handle nominal features without the need for pre-processing techniques such as one-hot encoding.
-* Sparse feature matrices can now be used for training and prediction, which reduces the memory footprint and results in a significant speed-up of training times on some data sets.
-* Additional hyper-parameters (`min_coverage`, `max_conditions` and `max_head_refinements`) that provide fine-grained control over the specificity/generality of rules have been added.
+- Includes many refactorings and quality of live improvements. Code that is not directly related with the algorithm, such as the implementation of baselines, has been removed.
+- The algorithm is now able to natively handle nominal features without the need for pre-processing techniques such as one-hot encoding.
+- Sparse feature matrices can now be used for training and prediction, which reduces the memory footprint and results in a significant speed-up of training times on some data sets.
+- Additional hyper-parameters (`min_coverage`, `max_conditions` and `max_head_refinements`) that provide fine-grained control over the specificity/generality of rules have been added.
 
 ## Version 0.1.0 (Jun. 22, 2020)
 
@@ -208,9 +208,9 @@ The first version of the BOOMER algorithm used in the following publication:
 
 This version supports the following features to learn an ensemble of boosted classification rules:
 
-* Different label-wise or example-wise loss functions can be minimized during training (optionally using L2 regularization).
-* The rules may predict for a single label, or for all labels (which enables to model local label dependencies).
-* When learning a new rule, random samples of the training examples, features or labels may be used, including different techniques such as sampling with or without replacement.
-* The impact of individual rules on the ensemble can be controlled using shrinkage.
-* The conditions of a recently induced rule can be pruned based on a hold-out set.
-* The algorithm currently only supports numerical or ordinal features. Nominal features can be handled by using one-hot encoding.
+- Different label-wise or example-wise loss functions can be minimized during training (optionally using L2 regularization).
+- The rules may predict for a single label, or for all labels (which enables to model local label dependencies).
+- When learning a new rule, random samples of the training examples, features or labels may be used, including different techniques such as sampling with or without replacement.
+- The impact of individual rules on the ensemble can be controlled using shrinkage.
+- The conditions of a recently induced rule can be pruned based on a hold-out set.
+- The algorithm currently only supports numerical or ordinal features. Nominal features can be handled by using one-hot encoding.
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
index 569c63689b..d7afeb3d35 100644
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -2,127 +2,78 @@
 
 ## Our Pledge
 
-We as members, contributors, and leaders pledge to make participation in our
-community a harassment-free experience for everyone, regardless of age, body
-size, visible or invisible disability, ethnicity, sex characteristics, gender
-identity and expression, level of experience, education, socio-economic status,
-nationality, personal appearance, race, religion, or sexual identity
-and orientation.
+We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
 
-We pledge to act and interact in ways that contribute to an open, welcoming,
-diverse, inclusive, and healthy community.
+We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
 
 ## Our Standards
 
-Examples of behavior that contributes to a positive environment for our
-community include:
+Examples of behavior that contributes to a positive environment for our community include:
 
-* Demonstrating empathy and kindness toward other people
-* Being respectful of differing opinions, viewpoints, and experiences
-* Giving and gracefully accepting constructive feedback
-* Accepting responsibility and apologizing to those affected by our mistakes,
-  and learning from the experience
-* Focusing on what is best not just for us as individuals, but for the
-  overall community
+- Demonstrating empathy and kindness toward other people
+- Being respectful of differing opinions, viewpoints, and experiences
+- Giving and gracefully accepting constructive feedback
+- Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
+- Focusing on what is best not just for us as individuals, but for the overall community
 
 Examples of unacceptable behavior include:
 
-* The use of sexualized language or imagery, and sexual attention or
-  advances of any kind
-* Trolling, insulting or derogatory comments, and personal or political attacks
-* Public or private harassment
-* Publishing others' private information, such as a physical or email
-  address, without their explicit permission
-* Other conduct which could reasonably be considered inappropriate in a
-  professional setting
+- The use of sexualized language or imagery, and sexual attention or advances of any kind
+- Trolling, insulting or derogatory comments, and personal or political attacks
+- Public or private harassment
+- Publishing others' private information, such as a physical or email address, without their explicit permission
+- Other conduct which could reasonably be considered inappropriate in a professional setting
 
 ## Enforcement Responsibilities
 
-Community leaders are responsible for clarifying and enforcing our standards of
-acceptable behavior and will take appropriate and fair corrective action in
-response to any behavior that they deem inappropriate, threatening, offensive,
-or harmful.
+Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
 
-Community leaders have the right and responsibility to remove, edit, or reject
-comments, commits, code, wiki edits, issues, and other contributions that are
-not aligned to this Code of Conduct, and will communicate reasons for moderation
-decisions when appropriate.
+Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
 
 ## Scope
 
-This Code of Conduct applies within all community spaces, and also applies when
-an individual is officially representing the community in public spaces.
-Examples of representing our community include using an official e-mail address,
-posting via an official social media account, or acting as an appointed
-representative at an online or offline event.
+This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
 
 ## Enforcement
 
-Instances of abusive, harassing, or otherwise unacceptable behavior may be
-reported to the community leaders responsible for enforcement at
-michael.rapp.ml@gmail.com.
-All complaints will be reviewed and investigated promptly and fairly.
+Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at michael.rapp.ml@gmail.com. All complaints will be reviewed and investigated promptly and fairly.
 
-All community leaders are obligated to respect the privacy and security of the
-reporter of any incident.
+All community leaders are obligated to respect the privacy and security of the reporter of any incident.
 
 ## Enforcement Guidelines
 
-Community leaders will follow these Community Impact Guidelines in determining
-the consequences for any action they deem in violation of this Code of Conduct:
+Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
 
 ### 1. Correction
 
-**Community Impact**: Use of inappropriate language or other behavior deemed
-unprofessional or unwelcome in the community.
+**Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
 
-**Consequence**: A private, written warning from community leaders, providing
-clarity around the nature of the violation and an explanation of why the
-behavior was inappropriate. A public apology may be requested.
+**Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
 
 ### 2. Warning
 
-**Community Impact**: A violation through a single incident or series
-of actions.
+**Community Impact**: A violation through a single incident or series of actions.
 
-**Consequence**: A warning with consequences for continued behavior. No
-interaction with the people involved, including unsolicited interaction with
-those enforcing the Code of Conduct, for a specified period of time. This
-includes avoiding interactions in community spaces as well as external channels
-like social media. Violating these terms may lead to a temporary or
-permanent ban.
+**Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
 
 ### 3. Temporary Ban
 
-**Community Impact**: A serious violation of community standards, including
-sustained inappropriate behavior.
+**Community Impact**: A serious violation of community standards, including sustained inappropriate behavior.
 
-**Consequence**: A temporary ban from any sort of interaction or public
-communication with the community for a specified period of time. No public or
-private interaction with the people involved, including unsolicited interaction
-with those enforcing the Code of Conduct, is allowed during this period.
-Violating these terms may lead to a permanent ban.
+**Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
 
 ### 4. Permanent Ban
 
-**Community Impact**: Demonstrating a pattern of violation of community
-standards, including sustained inappropriate behavior,  harassment of an
-individual, or aggression toward or disparagement of classes of individuals.
+**Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
 
-**Consequence**: A permanent ban from any sort of public interaction within
-the community.
+**Consequence**: A permanent ban from any sort of public interaction within the community.
 
 ## Attribution
 
-This Code of Conduct is adapted from the [Contributor Covenant][homepage],
-version 2.0, available at
-https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
 
-Community Impact Guidelines were inspired by [Mozilla's code of conduct
-enforcement ladder](https://github.com/mozilla/diversity).
+Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity).
 
-[homepage]: https://www.contributor-covenant.org
+For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.
 
-For answers to common questions about this code of conduct, see the FAQ at
-https://www.contributor-covenant.org/faq. Translations are available at
-https://www.contributor-covenant.org/translations.
+[homepage]: https://www.contributor-covenant.org
diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
index 396347f73f..39f9315e2a 100644
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -6,17 +6,17 @@ BOOMER is open source software. Everyone is welcomed to contribute to the projec
 
 We highly appreciate the efforts of the following persons (listed in alphabetical order), who have actively contributed code to the project:
 
-* [Andreas Seidl Fernandez](https://github.com/AndreasSeidl)
-* [Anna Kulischkin](https://github.com/Anna-inf)
-* [Carsten Ostlender](https://github.com/CarstenOstlender)
-* [Dennis Drössler](https://github.com/ddroessler)
-* [Eneldo Loza Mencía](https://github.com/keelm)
-* [Jakob Steeg](https://github.com/JayJayJay1)
-* [Kevin Kampa](https://github.com/bapfelbaum)
-* [Lukas Johannes Eberle](https://github.com/LukasEberle)
-* [Michael Rapp](https://github.com/michael-rapp)
-* [Paul Trojahn](https://github.com/ptrojahn)
+- [Andreas Seidl Fernandez](https://github.com/AndreasSeidl)
+- [Anna Kulischkin](https://github.com/Anna-inf)
+- [Carsten Ostlender](https://github.com/CarstenOstlender)
+- [Dennis Drössler](https://github.com/ddroessler)
+- [Eneldo Loza Mencía](https://github.com/keelm)
+- [Jakob Steeg](https://github.com/JayJayJay1)
+- [Kevin Kampa](https://github.com/bapfelbaum)
+- [Lukas Johannes Eberle](https://github.com/LukasEberle)
+- [Michael Rapp](https://github.com/michael-rapp)
+- [Paul Trojahn](https://github.com/ptrojahn)
 
 ## Special Thanks
 
-We would also like to thank Johannes Fürnkranz for making the project possible in the first place and sharing his extensive knowledge. 
+We would also like to thank Johannes Fürnkranz for making the project possible in the first place and sharing his extensive knowledge.
diff --git a/README.md b/README.md
index 34eb16df33..da4b5fd3a3 100644
--- a/README.md
+++ b/README.md
@@ -6,18 +6,13 @@
   </picture>
 </p>
 
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![PyPI version](https://badge.fury.io/py/mlrl-boomer.svg)](https://badge.fury.io/py/mlrl-boomer)
-[![Documentation Status](https://readthedocs.org/projects/mlrl-boomer/badge/?version=latest)](https://mlrl-boomer.readthedocs.io/en/latest/?badge=latest)
-[![Build](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml)
-[![Code style](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml)
-[![X URL](https://img.shields.io/twitter/url?label=Follow&style=social&url=https%3A%2F%2Ftwitter.com%2FBOOMER_ML)](https://twitter.com/BOOMER_ML)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![PyPI version](https://badge.fury.io/py/mlrl-boomer.svg)](https://badge.fury.io/py/mlrl-boomer) [![Documentation Status](https://readthedocs.org/projects/mlrl-boomer/badge/?version=latest)](https://mlrl-boomer.readthedocs.io/en/latest/?badge=latest) [![Build](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_build.yml) [![Code style](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml/badge.svg)](https://github.com/mrapp-ke/MLRL-Boomer/actions/workflows/test_format.yml) [![X URL](https://img.shields.io/twitter/url?label=Follow&style=social&url=https%3A%2F%2Ftwitter.com%2FBOOMER_ML)](https://twitter.com/BOOMER_ML)
 
 **Important links:** [Documentation](https://mlrl-boomer.readthedocs.io) | [Issue Tracker](https://github.com/mrapp-ke/MLRL-Boomer/issues) | [Changelog](https://github.com/mrapp-ke/MLRL-Boomer/blob/92ea9ac5e4b8f6c9b7557d0bee250ce9f75a32f4/CHANGELOG.md) | [Contributors](https://github.com/mrapp-ke/MLRL-Boomer/blob/92ea9ac5e4b8f6c9b7557d0bee250ce9f75a32f4/CONTRIBUTORS.md) | [Code of Conduct](https://github.com/mrapp-ke/MLRL-Boomer/blob/92ea9ac5e4b8f6c9b7557d0bee250ce9f75a32f4/CODE_OF_CONDUCT.md) | [License](https://github.com/mrapp-ke/MLRL-Boomer/blob/92ea9ac5e4b8f6c9b7557d0bee250ce9f75a32f4/LICENSE.md)
 
 This software package provides the official implementation of **BOOMER - an algorithm for learning gradient boosted multi-label classification rules** that integrates with the popular [scikit-learn](https://scikit-learn.org) machine learning framework.
 
-The goal of [multi-label classification](https://en.wikipedia.org/wiki/Multi-label_classification) is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics. The BOOMER algorithm uses [gradient boosting](https://en.wikipedia.org/wiki/Gradient_boosting) to learn an ensemble of rules that is built with respect to a given multivariate loss function. To provide a versatile tool for different use cases, great emphasis is put on the *efficiency* of the implementation. To ensure its *flexibility*, it is designed in a modular fashion and can therefore easily be adjusted to different requirements.  
+The goal of [multi-label classification](https://en.wikipedia.org/wiki/Multi-label_classification) is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics. The BOOMER algorithm uses [gradient boosting](https://en.wikipedia.org/wiki/Gradient_boosting) to learn an ensemble of rules that is built with respect to a given multivariate loss function. To provide a versatile tool for different use cases, great emphasis is put on the *efficiency* of the implementation. To ensure its *flexibility*, it is designed in a modular fashion and can therefore easily be adjusted to different requirements.
 
 ## References
 
@@ -25,46 +20,46 @@ The algorithm was first published in the following [paper](https://doi.org/10.10
 
 *Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.*
 
-If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper. An overview of publications that are concerned with the BOOMER algorithm, together with information on how to cite them, can be found in the section ["References"](https://mlrl-boomer.readthedocs.io/en/latest/references/index.html) of the documentation. 
+If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper. An overview of publications that are concerned with the BOOMER algorithm, together with information on how to cite them, can be found in the section ["References"](https://mlrl-boomer.readthedocs.io/en/latest/references/index.html) of the documentation.
 
 ## Functionalities
 
 The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification rules:
 
-* **Label-wise decomposable or non-decomposable loss functions** can be minimized in expectation.
-* **L1 and L2 regularization** can be used.
-* **Single-label, partial, or complete heads** can be used by rules, i.e., they can predict for an individual label, a subset of the available labels, or all labels. Predicting for multiple labels simultaneously enables rules to model local dependencies between labels.
-* **Various strategies for predicting regression scores, labels or probabilities** are available.
-* **Isotonic regression models can be used to calibrate marginal and joint probabilities** predicted by a model.
-* **Rules can be constructed via a greedy search or a beam search.** The latter may help to improve the quality of individual rules.
-* **Sampling techniques and stratification methods** can be used to learn new rules on a subset of the available training examples, features, or labels.
-* **Shrinkage (a.k.a. the learning rate) can be adjusted** to control the impact of individual rules on the overall ensemble.
-* **Fine-grained control over the specificity/generality of rules** is provided via hyper-parameters.
-* **Incremental reduced error pruning** can be used to remove overly specific conditions from rules and prevent overfitting.
-* **Post- and pre-pruning (a.k.a. early stopping)** allows to determine the optimal number of rules to be included in an ensemble.
-* **Sequential post-optimization** may help to improve the predictive performance of a model by reconstructing each rule in the context of the other rules.
-* **Native support for numerical, ordinal, and nominal features** eliminates the need for pre-processing techniques such as one-hot encoding.
-* **Handling of missing feature values**, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.
+- **Label-wise decomposable or non-decomposable loss functions** can be minimized in expectation.
+- **L1 and L2 regularization** can be used.
+- **Single-label, partial, or complete heads** can be used by rules, i.e., they can predict for an individual label, a subset of the available labels, or all labels. Predicting for multiple labels simultaneously enables rules to model local dependencies between labels.
+- **Various strategies for predicting regression scores, labels or probabilities** are available.
+- **Isotonic regression models can be used to calibrate marginal and joint probabilities** predicted by a model.
+- **Rules can be constructed via a greedy search or a beam search.** The latter may help to improve the quality of individual rules.
+- **Sampling techniques and stratification methods** can be used to learn new rules on a subset of the available training examples, features, or labels.
+- **Shrinkage (a.k.a. the learning rate) can be adjusted** to control the impact of individual rules on the overall ensemble.
+- **Fine-grained control over the specificity/generality of rules** is provided via hyper-parameters.
+- **Incremental reduced error pruning** can be used to remove overly specific conditions from rules and prevent overfitting.
+- **Post- and pre-pruning (a.k.a. early stopping)** allows to determine the optimal number of rules to be included in an ensemble.
+- **Sequential post-optimization** may help to improve the predictive performance of a model by reconstructing each rule in the context of the other rules.
+- **Native support for numerical, ordinal, and nominal features** eliminates the need for pre-processing techniques such as one-hot encoding.
+- **Handling of missing feature values**, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.
 
 ## Runtime and Memory Optimizations
 
 In addition, the following features that may speed up training or reduce the memory footprint are currently implemented:
 
-* **Unsupervised feature binning** can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
-* **[Gradient-based label binning (GBLB)](https://arxiv.org/pdf/2106.11690.pdf)** can be used to assign the available labels to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.
-* **Sparse feature matrices** can be used for training and prediction. This may speed up training significantly on some data sets.
-* **Sparse label matrices** can be used for training. This may reduce the memory footprint in case of large data sets.
-* **Sparse prediction matrices** can be used to store predicted labels. This may reduce the memory footprint in case of large data sets.
-* **Sparse matrices for storing gradients and Hessians** can be used if supported by the loss function. This may speed up training significantly on data sets with many labels.
-* **Multi-threading** can be used to parallelize the evaluation of a rule's potential refinements across several features, to update the gradients and Hessians of individual examples in parallel, or to obtain predictions for several examples in parallel.
+- **Unsupervised feature binning** can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
+- **[Gradient-based label binning (GBLB)](https://arxiv.org/pdf/2106.11690.pdf)** can be used to assign the available labels to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.
+- **Sparse feature matrices** can be used for training and prediction. This may speed up training significantly on some data sets.
+- **Sparse label matrices** can be used for training. This may reduce the memory footprint in case of large data sets.
+- **Sparse prediction matrices** can be used to store predicted labels. This may reduce the memory footprint in case of large data sets.
+- **Sparse matrices for storing gradients and Hessians** can be used if supported by the loss function. This may speed up training significantly on data sets with many labels.
+- **Multi-threading** can be used to parallelize the evaluation of a rule's potential refinements across several features, to update the gradients and Hessians of individual examples in parallel, or to obtain predictions for several examples in parallel.
 
 ## Documentation
 
 An extensive user guide, as well as an API documentation for developers, is available at [https://mlrl-boomer.readthedocs.io](https://mlrl-boomer.readthedocs.io). If you are new to the project, you probably want to read about the following topics:
 
-* Instructions for [installing the software package](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/installation.html) or [building the project from source](https://mlrl-boomer.readthedocs.io/en/latest/developer_guide/compilation.html).
-* Examples of how to [use the algorithm](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/usage.html) in your own Python code or how to use the [command line API](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/testbed.html).
-* An overview of available [parameters](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/boosting/parameters.html).
+- Instructions for [installing the software package](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/installation.html) or [building the project from source](https://mlrl-boomer.readthedocs.io/en/latest/developer_guide/compilation.html).
+- Examples of how to [use the algorithm](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/usage.html) in your own Python code or how to use the [command line API](https://mlrl-boomer.readthedocs.io/en/latest/quickstart/testbed.html).
+- An overview of available [parameters](https://mlrl-boomer.readthedocs.io/en/latest/user_guide/boosting/parameters.html).
 
 A collection of benchmark datasets that are compatible with the algorithm are provided in a separate [repository](https://github.com/mrapp-ke/Boomer-Datasets).