From 98140a9f5b6139e6e3beb3793794f097a478f11e Mon Sep 17 00:00:00 2001 From: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com> Date: Tue, 7 May 2024 19:14:54 +0300 Subject: [PATCH 1/3] Update statistic.md --- doc/configuration/statistic.md | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/doc/configuration/statistic.md b/doc/configuration/statistic.md index fdee6aefd..1f9be4b6c 100644 --- a/doc/configuration/statistic.md +++ b/doc/configuration/statistic.md @@ -27,7 +27,9 @@ Statistical tokens are stored in statfiles, which are then mapped to specific ba ## Statistics Configuration -Starting from Rspamd 2.0, we recommend using `redis` as the backend and `osb` as the tokenizer, which are set as the default settings. The default configuration settings can be found in the `$CONFDIR/statistic.conf` file. +Starting from Rspamd 2.0, we recommend using `redis` as the backend and `osb` as the tokenizer, which are set as the default settings. + +The default configuration settings can be found in the `$CONFDIR/statistic.conf` file. ~~~hcl classifier "bayes" { @@ -70,7 +72,16 @@ classifier "bayes" { .include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/statistic.conf" ~~~ -To enable per-user statistics, you can add the `users_enabled = true` property to the configuration of the classifier. However, it is important to ensure that Rspamd is called at the final delivery stage (e.g., LDA mode) to avoid issues with multi-recipient messages. When dealing with multi-recipient messages, Rspamd will use the first recipient for user-based statistics. +You are also recommended to use [`bayes_expiry` module](https://rspamd.com/doc/modules/bayes_expiry.html) to maintain your statistics database. + +Please note that `classifier-bayes.conf` is child config of `statistics.conf` which created for simplicity, you should not use them both at once. + +For most of setups where there is only one ham-spam statistic is tracked `classifier-bayes.conf` is suffient. + +If you need describe multiply different classifiers you need use `statistics.conf`, common usecase when first classifier is `per_user` and second is not. + +To enable per-user statistics, you can add the `per_user = true` property to the configuration of the classifier. However, it is *important* to ensure that Rspamd is called at the final delivery stage (e.g., LDA mode) to avoid issues with multi-recipient messages. When dealing with multi-recipient messages, Rspamd will use the first recipient for user-based statistics. + It's worth noting that Rspamd prioritizes SMTP recipients over MIME ones and gives preference to the special LDA header called `Delivered-To`, which can be appended using the `-d` option for `rspamc`. This allows for more accurate per-user statistics in your configuration. ### Classifier and headers @@ -88,13 +99,11 @@ Supported parameters for the Redis backend are: - `password` (optional): Password for the Redis server - `db` (optional): Database to use (though it is recommended to use dedicated Redis instances and not databases in Redis) - `min_tokens`: minimum number of words required for statistics processing -- `min_learns` (optional): minimum learn to count for **both** spam and ham classes to perform classification -- `autolearn` (optional): see below for details +- `min_learns` (optional): minimum learn to count for **both** spam and ham classes to perform classification +- `learn_condition`: Lua function that verifies that learning is needed. Default function **must** be set if you not wrote your own, omniting `learn_condition` from `statistic.conf` will lead to loosing protection from overlearning +- `autolearn` (optional): for more details see Autolearning section - `per_user` (optional): enable perusers statistics. See above -- `statfile`: Define keys for spam and ham mails. -- `learn_condition` (optional): Lua function for autolearning as described below. - -You are also recommended to use [`bayes_expiry` module](https://rspamd.com/doc/modules/bayes_expiry.html) to maintain your statistics database. +- `statfile`: Define keys for spam and ham mails ## Autolearning From f3e7ba1516b7471b5cb1f46f4371ab2dcb03a5de Mon Sep 17 00:00:00 2001 From: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com> Date: Tue, 7 May 2024 19:30:18 +0300 Subject: [PATCH 2/3] Update statistic.md --- doc/configuration/statistic.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/doc/configuration/statistic.md b/doc/configuration/statistic.md index 1f9be4b6c..e30a1579f 100644 --- a/doc/configuration/statistic.md +++ b/doc/configuration/statistic.md @@ -80,6 +80,8 @@ For most of setups where there is only one ham-spam statistic is tracked `classi If you need describe multiply different classifiers you need use `statistics.conf`, common usecase when first classifier is `per_user` and second is not. +### Per-user statistics + To enable per-user statistics, you can add the `per_user = true` property to the configuration of the classifier. However, it is *important* to ensure that Rspamd is called at the final delivery stage (e.g., LDA mode) to avoid issues with multi-recipient messages. When dealing with multi-recipient messages, Rspamd will use the first recipient for user-based statistics. It's worth noting that Rspamd prioritizes SMTP recipients over MIME ones and gives preference to the special LDA header called `Delivered-To`, which can be appended using the `-d` option for `rspamc`. This allows for more accurate per-user statistics in your configuration. @@ -93,6 +95,7 @@ The classifier in Rspamd learns headers that are specifically defined in the `cl Supported parameters for the Redis backend are: - `tokenizer`: leave it as shown for now. Currently, only OSB is supported +- `new_schema`: must be set to `true` - `backend`: set it to Redis - `servers`: IP or hostname with a port for the Redis server. Use an IP for the loopback interface, if you have defined localhost in /etc/hosts for IPv4 and IPv6, or your Redis server will not be found! - `write_servers` (optional): If needed, define dedicated servers for learning @@ -104,6 +107,10 @@ Supported parameters for the Redis backend are: - `autolearn` (optional): for more details see Autolearning section - `per_user` (optional): enable perusers statistics. See above - `statfile`: Define keys for spam and ham mails +- `cache_prefix` (optional): prefix used to create keys where to store hashes of already learned ids, defaults to `"learned_ids"` +- `cache_max_elt` (optional): amount of elements to store in one `learned_ids` key +- `cache_max_keys` (optional): amount of `learned_ids` keys to store +- `cache_elt_len` (optional): lenth of hash to store in one element of `learned_ids` ## Autolearning From adccf24355ab20028ff3b22345df9b79b3ae8667 Mon Sep 17 00:00:00 2001 From: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com> Date: Tue, 7 May 2024 19:59:36 +0300 Subject: [PATCH 3/3] Update statistic.md --- doc/configuration/statistic.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/configuration/statistic.md b/doc/configuration/statistic.md index e30a1579f..3198a40bb 100644 --- a/doc/configuration/statistic.md +++ b/doc/configuration/statistic.md @@ -72,13 +72,13 @@ classifier "bayes" { .include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/statistic.conf" ~~~ -You are also recommended to use [`bayes_expiry` module](https://rspamd.com/doc/modules/bayes_expiry.html) to maintain your statistics database. +You are also recommended to use [`bayes_expiry` module]({{ site.baseurl }}/doc/modules/bayes_expiry.html) to maintain your statistics database. -Please note that `classifier-bayes.conf` is child config of `statistics.conf` which created for simplicity, you should not use them both at once. +Please note that `classifier-bayes.conf` is include config of `statistic.conf` which created for user's simplicity. -For most of setups where there is only one ham-spam statistic is tracked `classifier-bayes.conf` is suffient. +For most of setups where there is only one classifier is used - `classifier-bayes.conf` is suffient and `statistic.conf` should be leaved unmodified. -If you need describe multiply different classifiers you need use `statistics.conf`, common usecase when first classifier is `per_user` and second is not. +If you need describe multiply different classifiers - then you need create `local.d/statistic.conf`, that should describe classifier sections with all details from default config, as there will be no fallback. Common usecase for such case is when first classifier is `per_user` and second is not. ### Per-user statistics