Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RLN rate-limiting over multiple shards #109

Open
jm-clius opened this issue Oct 21, 2024 · 8 comments
Open

RLN rate-limiting over multiple shards #109

jm-clius opened this issue Oct 21, 2024 · 8 comments

Comments

@jm-clius
Copy link
Contributor

jm-clius commented Oct 21, 2024

Background

Waku currently uses RLN rate-limiting to prevent publishers overwhelming the network with high-rate publishing. The current approach is summarised in the Waku Network specification. Of particular interest is how RLN proofs are validated: instead of slashing rate-limit violators' memberships, the validating RLN Relay nodes simply penalise peers forwarding messages exceeding the rate limit by lowering their gossipsub score (which will eventually lead to disconnection). This incentivises all RLN Relay nodes to validate and drop violating messages or else risk their gossipsub score falling to the point where they lose all connectivty.

Problem

An RLN membership allows a publisher to publish messages to multiple shards. Validating nodes record nullifiers included in the RLN proofs over an epoch and detect rate violators by detecting a repeat of a nullifier (see this description). However, if a validating node is subscribed only to a subset of shards it may not have recorded a previous use of a nullifier on an unsubscribed shard, thereby incorrectly forwarding a violating message as valid. This could lead to its gossipsub score being lowered and its disconnection from the network.

Possible solutions

Option 1: Membership per shard (discarded)

Discarded for not being scalable or economic.

Option 2: Include shard in RLN proof and rate-limit per shard

  1. Include the shard in the RLN signal similar to what was proposed for timestamp in chore(rln): move from epoch based gap to timestamp based. nwaku#2972
  2. Update validation rules to cache nullifiers per shard and only consider a message invalid if double-signalling detected in a nullifier on the same shard. In other words, the rate limit remains fixed for a single membership valid on all shards, but it becomes a per-shard rate-limit.

Immediate comments on the above:

  1. On second thought, I'm not exactly sure why (1) is necessary? In the case of the timestamp it makes sense to include it in the proof as the timestamp is a spoofable field in the WakuMessage. Publishers cannot "fake" the shard on which they publish a message, so if the validators want to validate per-shard they can already do so by simply considering the shard on which each message arrive?
  2. Afaiu adding the shard to the RLN signal will not alter the resultant nullifier. In other words, for messages generated by the same membership, in the same epoch and that violate the rate-limit (regardless of destination shard) a double nullifier will exist (on some shard). This means the RLN verification rule as described here will be violated. We can of course decide that these messages are not invalid as they "only" double a nullifier on another shard, a condition we now consider valid. However, is it wise to remove RLN's verification as primary rate-limiting check by adding a conditional validation rule on top? Presumably the publisher will simultaneously generate enough key shares for its secret key to be reconstructed (which would lead to slashing, if we had it implemented).

Option 3: Implement slashing

Remove gossipsub scoring as primary disincentivisation mechanism and implement slashing. If even one validating node is subscribed to all shards used by a rate-violating publisher, the publisher will be slashed and its membership removed.

cc @alrevuelta

@SionoiS
Copy link

SionoiS commented Oct 21, 2024

if a validating node is subscribed only to a subset of shards it may not have recorded a previous use of a nullifier on an unsubscribed shard, thereby incorrectly forwarding a violating message as valid.

That's bad but can be solved by adding the shard to the hash computation of each messages and only allowing shard-specific knowledge for message validation.

This would mean that RLN is no longer a GLOBAL rate-limit is that ok?

Also, option 3 only solve the problem IF it is economically viable to run a node to check ALL messages and slash offenders. It also becomes less and less viable the more shards and traffic increases.

@jm-clius
Copy link
Contributor Author

Comment from @alrevuelta on Discord:

Not sure "to cache nullifiers per shard" is needed. I mentioned the timestamp as an example, but we already have two examples, the content topic and the message hash.
https://github.com/waku-org/nwaku/blob/c5a825e206c1cb3e6e0cf8c01410527804cb76c4/waku/waku_rln_relay/rln_relay.nim#L291-L292

My idea is to bind the message to a given shard. This would also fix the meta problem of replaying attacks across shards, which I think we have. A receiver will check the proof constructing a signal with its local shards (due to consistent hashing, it can be taken from the content topic). If the message was meant to another shard, it wont verify.

I think this fixes both replaying attacks and gives determinism to message validation, since it will no longer depend on the shards a node is subscribed to. As a side effect, it makes the rate limit a shard-rate-limit rather than a global-rate-limit.

Re slashing. Agree it can fix the issue but I never liked the fact that going +1 message over the rate limit can kick you out, especially if we have other punishment options. but just an opinion.

@jm-clius
Copy link
Contributor Author

Thanks for your comments, @alrevuelta and @SionoiS. I realise there might be something fundamental that I'm missing. 😅

I can see how the message and proof can be tied to a shard by including it in the RLN signal. However, I don't see how this will affect the fact that a publisher is guaranteed to generate a nullifier that has been generated before if it exceeds the shard-agnostic rate-limit.
In other words, you are guaranteed to generate a double nullifier if you generate a proof while

  1. using the same sk
  2. over the same epoch
  3. more than rate-limit times

This is true, no matter how you modify the RLN signal. My understanding of the RLN signal is simply to further prove that this nullifier was indeed computed for the unique message and field values hashed into the signal (so that proofs cannot be reused but are tied to specific messages).
In other words, a validating node may receive a message that doubles a nullifier on some shard, which in RLN rules should count as a violation. I can see how we can choose to ignore it by determining that these double nullifiers were computed for two messages on two different shards, but this would be the wrong way to go IMO.

@SionoiS
Copy link

SionoiS commented Oct 23, 2024

In my mind the first step for RLN message validation is to check that the hash indeed match the message then you check the zk proof.

in the context of local-only verifiers, they cannot know that a message was sent on other shards BUT they can know that a message was sent multiple time on their shard if we include this information as part of the message hash. This prevents verifiers from spreading invalid messages on their shard.

As for the global rate limit it is impossible for a local verifier to know it was violated. If we want to punish that, slashing should be implemented which may incentivize global verifiers.

As a side note, this here problem is why one should be careful when using a logically centralized system (blockchains) as part of their design, it clash with the other decentralized parts.

@jm-clius
Copy link
Contributor Author

In my mind the first step for RLN message validation is to check that the hash indeed match the message then you check the zk proof.

Indeed. This is my understanding too.

they can know that a message was sent multiple time

The issue of duplication on a single shard is generally covered by:

  1. short term deduplication in gossipsub (~2 mins)
  2. timestamp validation - messages cannot be more than 20 seconds old compared to local time.

As for the global rate limit it is impossible for a local verifier to know it was violated. If we want to punish that, slashing should be implemented which may incentivize global verifiers.

I think I agree with this statement. Only other alternative I can see is to have membership for each shard, which is the smallest unit that can be monitored by validators.

@alrevuelta
Copy link
Contributor

As for the global rate limit it is impossible for a local verifier to know it was violated. If we want to punish that, slashing should be implemented which may incentivize global verifiers.

Also agree with the first part. That's why I'm suggesting to apply per-shard rate limits so that "everyone can be in the same page".

Agree also that with slashing this will be fixed, but unsure if the other things slashing introduce are okay.

short term deduplication in gossipsub (~2 mins)

But this protects within the shard. A cross-shard replay attack wont be protected by the gossipsubs cache?

@SionoiS
Copy link

SionoiS commented Oct 23, 2024

they can know that a message was sent multiple time

The issue of duplication on a single shard is generally covered by:

  1. short term deduplication in gossipsub (~2 mins)
  2. timestamp validation - messages cannot be more than 20 seconds old compared to local time.

Sorry, poor wording, I meant multiple in the context of RLN AKA spam.

We also should be careful because "the same message" is dictated by what's included in the hash. If we add the shard to the hash "the same message" cannot be sent on multiple shards anymore, which is not true ATM.

Case 1

no shard in hash

  • content == "hello world"
  • hash == 0xAB
  • RLN proof == 0x12
  • Timestamp == now

Msg is valid on all shard

Case 2

shard included in hash

  • content == "hello world"
  • hash == 0xCD
  • RLN proof == 0x34
  • timestamp == now
  • shard == 4

Msg is valid only on shard 4

@jm-clius
Copy link
Contributor Author

Right! Thanks for all the clarifications. I think we have identified two issues here:
(1) is related to duplicating messages across shards and
(2) about validators with "partial shards view" when the same publisher (sk) uses more than one shard in an epoch

Afaics the only solution for (2) would be slashing. We can continue tracking this problem in this issue.
For (1), encoding the shard into the signal/hash (and validating) would solve the issue. I've created a separate issue to track this: #110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants