Skip to content

Commit

Permalink
Merge pull request #743 from dragoangel/patch-3
Browse files Browse the repository at this point in the history
Update statistic.md
  • Loading branch information
vstakhov authored May 20, 2024
2 parents ccdb08d + adccf24 commit dece1df
Showing 1 changed file with 24 additions and 8 deletions.
32 changes: 24 additions & 8 deletions doc/configuration/statistic.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ Statistical tokens are stored in statfiles, which are then mapped to specific ba

## Statistics Configuration

Starting from Rspamd 2.0, we recommend using `redis` as the backend and `osb` as the tokenizer, which are set as the default settings. The default configuration settings can be found in the `$CONFDIR/statistic.conf` file.
Starting from Rspamd 2.0, we recommend using `redis` as the backend and `osb` as the tokenizer, which are set as the default settings.

The default configuration settings can be found in the `$CONFDIR/statistic.conf` file.

~~~hcl
classifier "bayes" {
Expand Down Expand Up @@ -70,7 +72,18 @@ classifier "bayes" {
.include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/statistic.conf"
~~~

To enable per-user statistics, you can add the `users_enabled = true` property to the configuration of the classifier. However, it is important to ensure that Rspamd is called at the final delivery stage (e.g., LDA mode) to avoid issues with multi-recipient messages. When dealing with multi-recipient messages, Rspamd will use the first recipient for user-based statistics.
You are also recommended to use [`bayes_expiry` module]({{ site.baseurl }}/doc/modules/bayes_expiry.html) to maintain your statistics database.

Please note that `classifier-bayes.conf` is include config of `statistic.conf` which created for user's simplicity.

For most of setups where there is only one classifier is used - `classifier-bayes.conf` is suffient and `statistic.conf` should be leaved unmodified.

If you need describe multiply different classifiers - then you need create `local.d/statistic.conf`, that should describe classifier sections with all details from default config, as there will be no fallback. Common usecase for such case is when first classifier is `per_user` and second is not.

### Per-user statistics

To enable per-user statistics, you can add the `per_user = true` property to the configuration of the classifier. However, it is *important* to ensure that Rspamd is called at the final delivery stage (e.g., LDA mode) to avoid issues with multi-recipient messages. When dealing with multi-recipient messages, Rspamd will use the first recipient for user-based statistics.

It's worth noting that Rspamd prioritizes SMTP recipients over MIME ones and gives preference to the special LDA header called `Delivered-To`, which can be appended using the `-d` option for `rspamc`. This allows for more accurate per-user statistics in your configuration.

### Classifier and headers
Expand All @@ -82,19 +95,22 @@ The classifier in Rspamd learns headers that are specifically defined in the `cl
Supported parameters for the Redis backend are:

- `tokenizer`: leave it as shown for now. Currently, only OSB is supported
- `new_schema`: must be set to `true`
- `backend`: set it to Redis
- `servers`: IP or hostname with a port for the Redis server. Use an IP for the loopback interface, if you have defined localhost in /etc/hosts for IPv4 and IPv6, or your Redis server will not be found!
- `write_servers` (optional): If needed, define dedicated servers for learning
- `password` (optional): Password for the Redis server
- `db` (optional): Database to use (though it is recommended to use dedicated Redis instances and not databases in Redis)
- `min_tokens`: minimum number of words required for statistics processing
- `min_learns` (optional): minimum learn to count for **both** spam and ham classes to perform classification
- `autolearn` (optional): see below for details
- `min_learns` (optional): minimum learn to count for **both** spam and ham classes to perform classification
- `learn_condition`: Lua function that verifies that learning is needed. Default function **must** be set if you not wrote your own, omniting `learn_condition` from `statistic.conf` will lead to loosing protection from overlearning
- `autolearn` (optional): for more details see Autolearning section
- `per_user` (optional): enable perusers statistics. See above
- `statfile`: Define keys for spam and ham mails.
- `learn_condition` (optional): Lua function for autolearning as described below.

You are also recommended to use [`bayes_expiry` module](https://rspamd.com/doc/modules/bayes_expiry.html) to maintain your statistics database.
- `statfile`: Define keys for spam and ham mails
- `cache_prefix` (optional): prefix used to create keys where to store hashes of already learned ids, defaults to `"learned_ids"`
- `cache_max_elt` (optional): amount of elements to store in one `learned_ids` key
- `cache_max_keys` (optional): amount of `learned_ids` keys to store
- `cache_elt_len` (optional): lenth of hash to store in one element of `learned_ids`

## Autolearning

Expand Down

0 comments on commit dece1df

Please sign in to comment.