Merge pull request #743 from dragoangel/patch-3

Update statistic.md
rspamd · May 20, 2024 · dece1df · dece1df
2 parents ccdb08d + adccf24
commit dece1df
Showing 1 changed file with 24 additions and 8 deletions.
diff --git a/doc/configuration/statistic.md b/doc/configuration/statistic.md
@@ -27,7 +27,9 @@ Statistical tokens are stored in statfiles, which are then mapped to specific ba
 
 ## Statistics Configuration
 
-Starting from Rspamd 2.0, we recommend using `redis` as the backend and `osb` as the tokenizer, which are set as the default settings. The default configuration settings can be found in the `$CONFDIR/statistic.conf` file.
+Starting from Rspamd 2.0, we recommend using `redis` as the backend and `osb` as the tokenizer, which are set as the default settings.
+
+The default configuration settings can be found in the `$CONFDIR/statistic.conf` file.
 
 ~~~hcl
 classifier "bayes" {
@@ -70,7 +72,18 @@ classifier "bayes" {
 .include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/statistic.conf"
 ~~~
 
-To enable per-user statistics, you can add the `users_enabled = true` property to the configuration of the classifier. However, it is important to ensure that Rspamd is called at the final delivery stage (e.g., LDA mode) to avoid issues with multi-recipient messages. When dealing with multi-recipient messages, Rspamd will use the first recipient for user-based statistics. 
+You are also recommended to use [`bayes_expiry` module]({{ site.baseurl }}/doc/modules/bayes_expiry.html) to maintain your statistics database.
+
+Please note that `classifier-bayes.conf` is include config of `statistic.conf` which created for user's simplicity.
+
+For most of setups where there is only one classifier is used - `classifier-bayes.conf` is suffient and `statistic.conf` should be leaved unmodified.
+
+If you need describe multiply different classifiers - then you need create `local.d/statistic.conf`, that should describe classifier sections with all details from default config, as there will be no fallback. Common usecase for such case is when first classifier is `per_user` and second is not.
+
+### Per-user statistics
+
+To enable per-user statistics, you can add the `per_user = true` property to the configuration of the classifier. However, it is *important* to ensure that Rspamd is called at the final delivery stage (e.g., LDA mode) to avoid issues with multi-recipient messages. When dealing with multi-recipient messages, Rspamd will use the first recipient for user-based statistics. 
+
 It's worth noting that Rspamd prioritizes SMTP recipients over MIME ones and gives preference to the special LDA header called `Delivered-To`, which can be appended using the `-d` option for `rspamc`. This allows for more accurate per-user statistics in your configuration.
 
 ### Classifier and headers
@@ -82,19 +95,22 @@ The classifier in Rspamd learns headers that are specifically defined in the `cl
 Supported parameters for the Redis backend are:
 
 - `tokenizer`: leave it as shown for now. Currently, only OSB is supported
+- `new_schema`: must be set to `true`
 - `backend`: set it to Redis
 - `servers`: IP or hostname with a port for the Redis server. Use an IP for the loopback interface, if you have defined localhost in /etc/hosts for IPv4 and IPv6, or your Redis server will not be found!
 - `write_servers` (optional): If needed, define dedicated servers for learning
 - `password` (optional): Password for the Redis server
 - `db` (optional): Database to use (though it is recommended to use dedicated Redis instances and not databases in Redis)
 - `min_tokens`: minimum number of words required for statistics processing
-- `min_learns` (optional): minimum learn to count for **both** spam and ham classes to perform  classification
-- `autolearn` (optional): see below for details
+- `min_learns` (optional): minimum learn to count for **both** spam and ham classes to perform classification
+- `learn_condition`: Lua function that verifies that learning is needed. Default function **must** be set if you not wrote your own, omniting `learn_condition` from `statistic.conf` will lead to loosing protection from overlearning
+- `autolearn` (optional): for more details see Autolearning section
 - `per_user` (optional): enable perusers statistics. See above
-- `statfile`: Define keys for spam and ham mails.
-- `learn_condition` (optional): Lua function for autolearning as described below.
-
-You are also recommended to use [`bayes_expiry` module](https://rspamd.com/doc/modules/bayes_expiry.html) to maintain your statistics database.
+- `statfile`: Define keys for spam and ham mails
+- `cache_prefix` (optional): prefix used to create keys where to store hashes of already learned ids, defaults to `"learned_ids"`
+- `cache_max_elt` (optional): amount of elements to store in one `learned_ids` key
+- `cache_max_keys` (optional): amount of `learned_ids` keys to store
+- `cache_elt_len` (optional): lenth of hash to store in one element of `learned_ids`
 
 ## Autolearning