Skip to content

Commit

Permalink
Merge pull request #867 from mrapp-ke/merge-bugfix
Browse files Browse the repository at this point in the history
Merge bugfix into feature branch
  • Loading branch information
issue-api-tokens[bot] authored May 4, 2024
2 parents 9c214ce + dc153c2 commit e14caf1
Show file tree
Hide file tree
Showing 6 changed files with 407 additions and 99 deletions.
28 changes: 22 additions & 6 deletions doc/user_guide/testbed/arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -425,12 +425,28 @@ In accordance with the syntax that is typically used by command line programs, t

For example, the value of the parameter `feature_binning` may be set as follows:

```text
boomer --data-dir /path/to/datasets/ --dataset name --feature-binning equal-width
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset name --feature-binning equal-width
```
````

````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset name --feature-binning equal-width
```
````

Some algorithmic parameters, including the parameter `feature_binning`, allow to specify additional options as key-value pairs by using a {ref}`bracket-notation`. This is also supported by the command line API, where the options may not contain any spaces and special characters like `{` or `}` must be escaped by using single-quotes (`'`):

```text
boomer --data-dir /path/to/datasets/ --dataset name --feature-binning equal-width'{bin_ratio=0.33,min_bins=2,max_bins=64}'
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset name --feature-binning equal-width'{bin_ratio=0.33,min_bins=2,max_bins=64}'
```
````

````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset name --feature-binning equal-width'{bin_ratio=0.33,min_bins=2,max_bins=64}'seco --data-dir /path/to/datasets/ --dataset name --feature-binning equal-width
```
````
172 changes: 136 additions & 36 deletions doc/user_guide/testbed/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,33 @@ Several strategies for splitting the available data into distinct training and t

The simplest and computationally least demanding strategy for obtaining training and tests is to randomly split the available data into two, mutually exclusive, parts. This strategy, which is used by default, if not specified otherwise, can be used by providing the argument `--data-split train-test` to the command line API:

```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split train-test
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split train-test
```
````

````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --data-split train-test
```
````

Following the argument `--dataset`, the program loads the training data from a file named `dataset-name_training.arff`. Similarly, it expects the test data to be stored in a file named `dataset-name_test.arff`. If these files are not available, the program searches for a file with the name `dataset-name.arff` and splits it into training and test data automatically.

When it is the responsibility of the command line API to split a given dataset into training and test tests, 66% of the data are included in the training set, whereas the remaining 33% are part of the test set. Although this ratio is frequently used in machine learning, you can easily adjust it by providing the option `test_size`:

```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split 'train-test{test_size=0.25}'
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split 'train-test{test_size=0.25}'
```
````

````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --data-split 'train-test{test_size=0.25}'
```
````

This command instructs the command line API to include 75% of the available data in the training set and use the remaining 25% for the test set.

Expand All @@ -34,21 +50,47 @@ This command instructs the command line API to include 75% of the available data

A more elaborate strategy for splitting data into training and test sets, which results in more realistic performance estimates, but also entails greater computational costs, is referred to as [cross validation](<https://en.wikipedia.org/wiki/Cross-validation_(statistics)>) (CV). The basic idea is to split the available data into several, equally-sized, parts. Afterwards, several machine learning models are trained and evaluated on different portions of the data using the same learning method. Each of these parts are used for testing exactly once, whereas the remaining ones make up the training set. The performance estimates that are obtained for each of these subsequent runs, referred to as *folds*, are finally averaged to obtain a single score and corresponding [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation). The command line API can be instructed to perform a cross validation using the argument `--data-split cv`:

```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split cv
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split cv
```
````

````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --data-split cv
```
````

By default, a 10-fold cross validation, where ten models are trained and evaluated, is performed. The number of folds can easily be adjusted via the option `num_folds`. For example, the following command results in a 5-fold CV being used:

```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split 'cv{num_folds=5}'
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split 'cv{num_folds=5}'
```
````

```{tip}
````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --data-split 'cv{num_folds=5}'
```
````

`````{tip}
When providing the option `current_fold`, only a single fold, instead of the entire procedure, is performed. This is particularly useful, if one intends to train and evaluate the models for each individual fold in parallel on different machines. For example, the following command does only execute the second fold of a 5-fold CV:
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split 'cv{num_folds=5,current_fold=2}'
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split 'cv{num_folds=5,current_fold=2}'
```
````
````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --data-split 'cv{num_folds=5,current_fold=2}'
```
````
`````

(evaluating-training-data)=

Expand All @@ -60,15 +102,33 @@ The configuraton described in this section should only be used for testing purpo

Sometimes, evaluating the performance of a model on the data it has been trained on can be helpful for analyzing the behavior of a machine learning algorithm, e.g., if one needs to check if the approach is able to fit the data accurately. For this purpose, the command line API allows to use the argument `--data-split none`, which results in the given data not being split at all. Instead, the learning algorithm is applied to the entire dataset and predictions are be obtained from the resulting model for the exact same data points. The argument can be specified as follows:

```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split none
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split none
```
````

````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --data-split none
```
````

```{tip}
`````{tip}
If you are interested in obtaining evaluation results for the training data in addition to the test data when using a train-test-split or a cross validation, as discussed above, the argument `--evaluate-training-data true` may be used:
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split cv --evaluate-training-data true
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --data-split cv --evaluate-training-data true
```
````
````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --data-split cv --evaluate-training-data true
```
````
`````

(prediction-types)=

Expand All @@ -82,9 +142,17 @@ The metrics for evaluating the quality of predictions that have been obtained fo

We refer to real-valued predictions, which may be positive or negative, as *regression scores*. In the context of multi-label classification, positive scores indicate a preference towards predicting a label as relevant, whereas negative scores are predicted for labels that are more likely to be irrelevant. The absolute size of the scores corresponds to the confidence of the predictions, i.e., if a large value is predicted for a label, the model is more certain about the correctness of the predicted outcome. Unlike {ref}`probability-estimates`, regression scores are not bound to a certain interval and can be arbitrary positive or negative values. The BOOMER algorithm uses regression scores as a basis for predicting probabilities or binary labels. If you want to evaluate the quality of the regression scores directly, instead of transforming them into probabilities or binary predictions, the argument `--prediction-type scores` may be passed to the command line API:

```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --prediction-type scores
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --prediction-type scores
```
````

````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --prediction-type scores
```
````

For evaluating the quality of regression scores, [multi-label ranking measures](https://scikit-learn.org/stable/modules/model_evaluation.html#multilabel-ranking-metrics) provided by the [scikit-learn](https://scikit-learn.org) framework are used.

Expand All @@ -94,29 +162,53 @@ For evaluating the quality of regression scores, [multi-label ranking measures](

Probability estimates are given as real values between zero and one. In the context of multi-label classification, they express the probability of a label being relevant. If the predicted probability is close to zero, the corresponding label is more likely to be irrelevant, whereas a probability close to one is predicted for labels that are likely to be relevant. If you intend to evaluate the quality of probabilistic predictions, the argument `--prediction-type probabilities` should be used:

```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --prediction-type probabilities
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --prediction-type probabilities
```
````

````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --prediction-type probabilities
```
````

Similar to {ref}`regression-scores`, the command line API relies on [multi-label ranking measures](https://scikit-learn.org/stable/modules/model_evaluation.html#multilabel-ranking-metrics), as implemented by the [scikit-learn](https://scikit-learn.org) framework, for evaluating probability estimates.

### Binary Labels

The most common type of prediction used for multi-label classification are binary predictions that directly indicate whether a label is considered as irrelevant or relevant. Irrelevant labels are represented by the value `0`, whereas the value `1` is predicted for relevant labels. By default, the command line API instructs the learning method to provide binary predictions. If you want to explicitly instruct it to use this particular type of predictions, you can use the argument `--prediction-type binary`:

```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --prediction-type binary
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --prediction-type binary
```
````

````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --prediction-type binary
```
````

In a multi-label setting, the quality of binary predictions is assessed in terms of commonly used [multi-label classification metrics](https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics) implemented by the [scikit-learn](https://scikit-learn.org) framework. If a dataset contains only a single label, the evaluation is restricted to classification metrics that are suited for single-label classification problems.

## Incremental Evaluation

When evaluating the predictive performance of an [ensemble method](https://en.wikipedia.org/wiki/Ensemble_learning), i.e., models that consist of several weak predictors, also referred to as *ensemble members*, the command line API supports to evaluate these models incrementally. In particular, rule-based machine learning algorithms like the ones implemented by this project are often considered as ensemble methods, where each rule in a model can be viewed as a weak predictor. Adding more rules to a model typically results in better predictive performance. However, adding too many rules may result in overfitting the training data and therefore achieving subpar performance on the test data. For analyzing such behavior, the arugment `--incremental-evaluation true` may be passed to the command line API:

```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --incremental-evaluation true
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --incremental-evaluation true
```
````

````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --incremental-evaluation true
```
````

When using the above command, the rule-based model that is learned by the BOOMER algorithm is evaluated repeatedly as more rules are added to it. Evaluation results are obtained for a model consisting of a single rule, two rules, three rules, and so on. Of course, because the evaluation is performed multiple times, this evaluation strategy comes with a large computational overhead. Therefore, depending on the size of the final model, it might be necessary to limit the number of evaluations via the following options:

Expand All @@ -126,6 +218,14 @@ When using the above command, the rule-based model that is learned by the BOOMER

For example, the following command may be used for the incremental evaluation of a BOOMER model that consists of up to 1000 rules. The model is evaluated for the first time after 200 rules have been added. Subsequent evaluations are perfomed when the model comprises 400, 600, 800, and 1000 rules.

```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --incremental-evaluation 'true{min_size=200,max_size=1000,step_size=200}'
```
````{tab} BOOMER
```text
boomer --data-dir /path/to/datasets/ --dataset dataset-name --incremental-evaluation 'true{min_size=200,max_size=1000,step_size=200}'
```
````

````{tab} SeCo
```text
seco --data-dir /path/to/datasets/ --dataset dataset-name --incremental-evaluation 'true{min_size=200,max_size=1000,step_size=200}'
```
````
Loading

0 comments on commit e14caf1

Please sign in to comment.