Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIB-1 The Transaction Sorting Algorithm In RocksdbMempool Can Lead To A Series Of Issues #321

Closed
SA124 opened this issue Aug 8, 2024 · 10 comments · Fixed by #749
Closed
Assignees
Labels
Source: Audit Issues discovered by audit. suzuka:security

Comments

@SA124
Copy link

SA124 commented Aug 8, 2024

Severity: Major
Status: Pending
Auditor: Movebit
Topic:Suzuka-only custom components
Discovery Methods: Manual Review

LIB-1 The Transaction Sorting Algorithm In
RocksdbMempool Can Lead To A Series Of Issues

Code Location:
protocol-units/mempool/move-rocks/src/lib.rs#37

Descriptions:
The sorting algorithm for transactions in RocksdbMempool is as follows:

First, transactions are sorted by timestamp (the timestamp is in 2-second increments).
Then, they are sorted by sequence_number.
Finally, they are sorted by transaction hash.

Screenshot 2024-08-08 at 3 08 28 PM

The light_node transaction packing algorithm works as follows:
It packs 10 transactions into one block.
If there are fewer than 10 transactions, it packs a block every second.
This could result in the following situation:
Transactions A and B belong to the same account.
Transaction A has a timestamp of 12 and a sequence_number of 11.
Transaction B has a timestamp of 10 and a sequence_number of 15.
In this case, transaction B would be executed before transaction A. However, Aptos
rules require that transaction A must be executed before transaction B.
This sorting algorithm could lead to a series of unknown issues, which would require a deep
analysis of the Aptos-core module to understand fully.

Suggestion:
It is recommended to refer to the sorting method used by Aptos-core.

@l-monninger
Copy link
Collaborator

l-monninger commented Aug 22, 2024

This is actually a known issue. We were comfortable, for the moment, assuming that a client should only send transactions in rapid succession to one node. And, thus that the map from sequence numbers to accounts for a given slot would be non-decreasing.

There is also a bit of fundamental undecidability here which we had yet to determine that Aptos addresses in a much better manner. That is, you can't actually know that a user won't submit a lesser sequence number at a later stage. The best you can do is compensate as best you can for typical network conditions to make it such that if they are a well-behaving client, they won't encounter these issues.

This is where the single-node interaction and slot comes in. If you are interacting with a single node appropriately via the client, you should pretty never encounter a situation where you end up with the described inversion. This is because you will have either waited for confirmation of the transaction with the lower sequence number or sent a batch that will necessarily place all sequence numbers into non-decreasing slots because they will be piped to the DA in order.

However, around the edges of slots, there could be some problems if you are not quite using the API correctly. That is, if you send transactions without waiting confirmation to the same node, you may end up with the above inversion.

Of course, you can't order by sequence number then slot, because the sequence number is generated by the client. Everybody would then be incentivized to generate new accounts to obtain low sequence numbers and prioritize their transactions.

We could potentially smooth out the slots by assigning another mapping to a slot offset by $s'(t) = s(t + \frac{1}{2} \delta)$. Then, we assign slots by $\max s(t), s'(t)$. We held off on adding this because we were comfortable with the assumptions above, but perhaps this is a good time to reconsider.

@mzabaluev mzabaluev added the Source: Audit Issues discovered by audit. label Aug 26, 2024
@poetyellow
Copy link

poetyellow commented Aug 28, 2024

  1. Our previously assumed attack method was incorrect.
  2. However, prioritizing transactions based on timestamps for block inclusion is unreasonable. Other blockchains prioritize gas fees as the first priority. Suppose an attacker, A, sends 1,000 low-gas-fee transactions within a short period; these transactions will always be prioritized for inclusion in blocks, causing subsequent transactions to wait a long time before being processed. This would block the normal transactions in the entire network. The attacker can delay the processing time of normal transactions with minimal cost.

@poetyellow
Copy link

Our audit was based on the 030fe01 commit . I'm not sure if you are talking about the latest source code.

@l-monninger
Copy link
Collaborator

l-monninger commented Sep 10, 2024

However, prioritizing transactions based on timestamps for block inclusion is unreasonable. Other blockchains prioritize gas fees as the first priority. Suppose an attacker, A, sends 1,000 low-gas-fee transactions within a short period; these transactions will always be prioritized for inclusion in blocks, causing subsequent transactions to wait a long time before being processed. This would block the normal transactions in the entire network. The attacker can delay the processing time of normal transactions with minimal cost.

The timestamps being sorted here are based on when the transaction enters a given node's mempool. It is not set by the client. A given node is simply ordering by when it first saw a transaction. It uses slots to account for common network conditions, e.g., latency for clients who are geographically more distant.

@l-monninger
Copy link
Collaborator

@poetyellow

@l-monninger l-monninger added this to the Centralized Mainnet milestone Sep 12, 2024
@poetyellow
Copy link

We can assume the following attack scenario:

  1. The attacker sends 1,000 transactions within 2 seconds, and these transactions all have very low gas fees.
  2. Assuming that the block-packing speed of Movement is 10 transactions per second, it would take 100 seconds to process these junk transactions. These transactions would be executed before future transactions since our sorting algorithm prioritizes timestamps.
  3. This allows the attacker to slow down the entire network’s execution speed at a low cost.
  4. Although the attack incurs some cost, if the profit from blocking the normal processing of transactions outweighs the attack cost, the attack could be viable.
  5. If we sort transactions by gas fee instead, it would significantly raise the cost of the attack.

@l-monninger
Copy link
Collaborator

@poetyellow we would like to keep the DA agnostic, however, adding an app_gas_estimate field to our transaction types would be reasonable and align with future intentions. The concern we have about using this with Aptos is then a similar load-based attack via the simulation API. But, I will spend some time today and tomorrow considering the processing here.

We also introduced #722 to help address this and a related issue. The idea is that single a user can only submit out a certain tolerance ahead of the committed sequence number, meaning they can't actually send 1,000 transaction within 2 seconds.

let min_sequence_number = (min_used_sequence_number).max(committed_sequence_number);

However, this not particularly sybil resistant, insofar as we can assume it's not too costly create multiple funded accounts.

As we pursue this, we should additionally estimate gas costs on the DA and at settlement. This is somewhat a subject of research, but I can add some basic support fairly quickly that you can review.

@l-monninger
Copy link
Collaborator

The cost here is zero because we are already doing validation which computes a priority. I will add this directly and not under a feature flag.

@l-monninger
Copy link
Collaborator

@poetyellow See the above.

@poetyellow
Copy link

yes ,thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Source: Audit Issues discovered by audit. suzuka:security
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants