From e287bbc768979e115ea96cd1197e029a199a0a56 Mon Sep 17 00:00:00 2001 From: Dmitriy Alekseev <1865999+dragoangel@users.noreply.github.com> Date: Thu, 23 May 2024 09:44:19 +0200 Subject: [PATCH] Add information about Bayes per-user sharding (#749) * Update statistic.md --------- Co-authored-by: Alexander Moisseev --- doc/configuration/statistic.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/doc/configuration/statistic.md b/doc/configuration/statistic.md index 3198a40bb..8cc1d6014 100644 --- a/doc/configuration/statistic.md +++ b/doc/configuration/statistic.md @@ -86,6 +86,28 @@ To enable per-user statistics, you can add the `per_user = true` property to the It's worth noting that Rspamd prioritizes SMTP recipients over MIME ones and gives preference to the special LDA header called `Delivered-To`, which can be appended using the `-d` option for `rspamc`. This allows for more accurate per-user statistics in your configuration. +#### Sharding + +Starting from version 3.9, per-user statistics can be sharded across different Redis servers using the [hash algorithm]({{ site.baseurl }}/doc/configuration/upstream.html#hash-algorithm). + +Example of using two stand-alone master shards without read replicas: +~~~hcl +servers = "hash:bayes-peruser-0-master,bayes-peruser-1-master"; +~~~ + +Example of using a setup with three master-replica shards: +~~~hcl +write_servers = "hash:bayes-peruser-0-master,bayes-peruser-1-master,bayes-peruser-2-master"; +read_servers = "hash:bayes-peruser-0-replica,bayes-peruser-1-replica,bayes-peruser-2-replica"; +~~~ + +Important notes: +1. Changing the shard count requires dropping all Bayes statistics, so please make decisions wisely. +2. Each replica should have the same position in `read_servers` as its master in `write_servers`; otherwise, this will result in misaligned read-write hash slot assignments. +3. You can't use more than one replica per master in a sharded setup; this will result in misaligned read-write hash slot assignments. +4. Redis Sentinel cannot be used for a sharded setup. +5. In the controller, you will see incorrect `Bayesian statistics` for the count of learns and users. + ### Classifier and headers The classifier in Rspamd learns headers that are specifically defined in the `classify_headers` section of the `options.inc `file. Therefore, there is no need to remove any additional headers (e.g., X-Spam) before the learning process, as these headers will not be utilized for classification purposes. Rspamd also takes into account the `Subject` header, which is tokenized according to the aforementioned rules. Additionally, Rspamd considers various meta-tokens, such as message size or the number of attachments, which are extracted from the messages for further analysis.