Enable validation of submitted transactions with local state index #693

m-Peter · 2024-12-03T12:16:15Z

Closes: #654
Closes: #586
Closes: #118

Description

For contributor use:

Targeted PR against master branch
Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
Code follows the standards mentioned here.
Updated relevant documentation
Re-reviewed Files changed in the Github PR explorer
Added appropriate labels

coderabbitai · 2024-12-03T12:16:21Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

zhangchiqing

Looks good. Just had one question.

zhangchiqing · 2024-12-05T18:23:03Z

services/requester/pool.go

-		if res.Error != nil {
-			if err, ok := parseInvalidError(res.Error); ok {
-				return err
+	if t.config.TxStateValidation == config.TxSealValidation {


My comment is not here, but for L62 where we had

t.txPublisher.Publish(evmTx) // publish pending transaction event

Do we still need to mark the tx as sealed if TxStateValidation is local-index?

Regardless of what mechanism we use to validate submitted transactions, the t.txPublisher.Publish(evmTx) is used for this filter. It's a polling filter that developers can use to get notified when there's a pending transaction in the network. I have an issue for reworking the way we handle pending transactions in the codebase, so that we match the functionality from Geth: #544. But it will be in a future PR.

bootstrap/bootstrap.go

m-Peter · 2024-12-10T11:54:07Z

bootstrap/bootstrap.go

+			err,
+		)
+	}
+	signer, err := createSigner(ctx, b.config, b.logger)


Note: After viewing this comment, I am not entirely sure if we should have one crypto.Signer object for each account key, or create one for each account key. Are there any thread-safe concerns here? 🤔

A single signer cannot be used across go routines, so we need multiple. One for every account key.

Actually the crypto.PrivateKey is not thread safe either! So if multiple account keys have the same private key, you need to create multiple copies of crypto.PrivateKey.

Note that we have 2 types of signers, one is the in-memory signer:

crypto.NewInMemorySigner(config.COAKey, crypto.SHA3_256)

The other is requester.NewKMSKeySigner, which doesn't deal at all with crypto.PrivateKey

Currently, only option 2. is supposed to be used in production. And I'm not sure if it has any thread-safety concerns, as it uses Cloud KMS for signing.

Option 1. is supposed to be used for local development / testing.

m-Peter · 2024-12-10T12:03:45Z

cmd/run/cmd.go

@@ -120,50 +119,24 @@ func parseConfigFromFlags() error {
 			return fmt.Errorf("invalid COA private key: %w", err)
 		}
 		cfg.COAKey = pkey
-	} else if keysPath != "" {


Note: We no longer need the option to use multiple private keys using a JSON file, so this flag is removed entirely.

m-Peter · 2024-12-10T12:04:55Z

cmd/run/cmd.go

-				KeyID:      keyParts[0],
-				KeyVersion: keyParts[1],
-			}
+		// key has the form "{keyID}@{keyVersion}"


Note: There's really no need to use one Cloud KMS key per Flow account key, and this might be actually costly to do so. Given that we might need ~1000 Flow account keys, we simply re-use one and the same Cloud KMS key for all of them.

m-Peter · 2024-12-10T12:08:37Z

services/ingestion/event_subscriber.go

@@ -169,6 +172,9 @@ func (r *RPCEventSubscriber) subscribe(ctx context.Context, height uint64) <-cha
 						continue
 					}
 				}
+				for _, evt := range blockEvents.Events {
+					r.keystore.UnlockKey(evt.TransactionID)


Note: I am not a big fan of having to pass around the Keystore instance. I would rather have that instance take care the release of used keys. It could be as simple as releasing keys after a fixed time period (such as 15-20 seconds), as it doesn't take more time for a Flow transaction to seal.

Note: I am not a big fan of having to pass around the Keystore instance.

Agree. We could pass an interface with just LockKey/UnlockKey methods, so the event subscriber cannot use the key store to sign anything.

I would rather have that instance take care the release of used keys.

I think ideally we need both. It's not enough to check only the block events. A failed tx might not produce any event, but we should still UnlockKey. The UnlockKey here is more for the happy path, which can unlock the key ASAP for it to be reused.

If we rely on a fixed time period only, then we would have to prepare lots of keys in case GW needs to send lots of tx within that fixed time period, otherwise, we might run out of keys.

For the fixed time period, we could keep track of tx's reference height and compare with the latest sealed height, since each tx has a expire time and a reference block, if a tx has not been sealed after X blocks, then it will never be sealed. Check this out.

shouldn't unlock be at transaction result ? like sign the transaction, send, check result, when result is sealed or executed ( I think executed should also work here ) then release the key.

@zhangchiqing Added your suggested functionality in 74cb17d .

I have also noticed this https://www.flowscan.io/tx/7c615604c3a0caf638ed615b6292384516b7edcb4a87fbf448f2f74be3d7b392 in mainnet. Not sure if it's a race condition or something 🤔 .

shouldn't unlock be at transaction result ? like sign the transaction, send, check result, when result is sealed or executed ( I think executed should also work here ) then release the key.

@bluesign That's what we do for the happy path, when the Flow transaction is executed, and receive the EVM.TransactionExecuted event, then we release the key. But if, for whatever reason, the EVM transaction is invalid, then we won't get an EVM.TransactionExecuted event, to release the key for that Flow transaction. That's why we have a go-routine to do it on an interval. We could also try to fetch the Flow transaction result, but we might get rate-limited from the AN.

@m-Peter thanks I didn't think about the rate limit.

m-Peter · 2024-12-10T12:10:34Z

services/requester/keystore.go

+	"github.com/onflow/flow-go-sdk/crypto"
+)
+
+var ErrNoKeysAvailable = fmt.Errorf("no keys available")


Note: Implementation taken and adapted from here: https://github.com/onflow/flow-go/blob/master/integration/benchmark/account/keys.go#L20-L110

m-Peter · 2024-12-10T12:19:51Z

services/requester/requester.go

+	args ...cadence.Value,
+) (*flow.Transaction, error) {
+	// building and signing transactions should be blocking, so we don't have keys conflict
+	e.mux.Lock()


Note: I think we could get rid of the lock acquisition here, as the key management is now inside the Keystore, and that is already blocking.

m-Peter · 2024-12-10T12:22:13Z

services/requester/requester.go

+		return err1
+	})
+	g.Go(func() error {
+		account, err2 = e.client.GetAccount(ctx, e.config.COAAddress)


Note: Having to get the account for each call of eth_sendRawTransaction might be sub-optimal. The only reason we need this is because we used to have a metric for the operator's balance:

e.collector.OperatorBalance(account)

which is called below. Maybe it's better to move this metric somewhere else.

Update: I've actually used that account variable, to fetch the proper SequenceNumber when using accKey.SetProposerPayerAndSign(flowTx, account).

m-Peter · 2024-12-10T13:11:12Z

tests/web3js/config.js

@@ -5,7 +5,7 @@ module.exports = {
    web3: web3,
    eoa: web3.eth.accounts.privateKeyToAccount('0xf6d5333177711e562cabf1f311916196ee6ffc2a07966d9d4628094073bd5442'), // eoa is 0xfacf71692421039876a5bb4f10ef7a439d8ef61e
    fundedAmount: 5.0,
-    startBlockHeight: 3n, // start block height after setup accounts
+    startBlockHeight: 4n, // start block height after setup accounts


Note: This one increased because when setting up integration tests, we also create a new Flow account, with multiple account keys, to be used from the Keystore. This also causes 1 more Flow block production, and if Flow block also creates an EVM block, which can be empty.

m-Peter · 2024-12-10T13:21:53Z

services/requester/requester.go

+		return err
+	}
+
+	// Ensure the transaction adheres to nonce ordering


Note: We only check for nonce too low, to match the functionality from Geth:

https://github.com/onflow/go-ethereum/blob/master/core/txpool/validation.go#L209-L219

https://github.com/onflow/go-ethereum/blob/master/core/txpool/legacypool/legacypool.go#L631

This has the benefit that it allows users to sign and submit sequential transactions from the same EOA. It is the sender's responsibility to make sure the correct nonce was used.

In EVM Gateway, if we checked for nonce too high, then we produce false negatives, as the transactions might be in flight, and the local index has not yet caught up.

…ncurrently

zhangchiqing · 2024-12-11T16:34:05Z

services/requester/keystore.go

+}
+
+func (k *Keystore) LockKey(txID flowsdk.Identifier, key *AccountKey) {
+	k.usedKeys[txID] = key


We better have a metrics of used keys or remaining keys, useful for debugging when running out of keys.

Added a metric for the available signing keys in 74cb17d . I have added it as a gauge: 74cb17d#diff-c5928930f4834ea503f8bfcc587f8cae9148c4e4c099b4eb9a9578c3307fda63R95-R98, not sure if we want a counter instead. But since it's not a monotonically-increasing value, I chose to use a gauge.

zhangchiqing · 2024-12-11T16:49:21Z

services/ingestion/event_subscriber.go

@@ -169,6 +172,9 @@ func (r *RPCEventSubscriber) subscribe(ctx context.Context, height uint64) <-cha
 						continue
 					}
 				}
+				for _, evt := range blockEvents.Events {
+					r.keystore.UnlockKey(evt.TransactionID)


Note: I am not a big fan of having to pass around the Keystore instance.

Agree. We could pass an interface with just LockKey/UnlockKey methods, so the event subscriber cannot use the key store to sign anything.

I would rather have that instance take care the release of used keys.

I think ideally we need both. It's not enough to check only the block events. A failed tx might not produce any event, but we should still UnlockKey. The UnlockKey here is more for the happy path, which can unlock the key ASAP for it to be reused.

If we rely on a fixed time period only, then we would have to prepare lots of keys in case GW needs to send lots of tx within that fixed time period, otherwise, we might run out of keys.

For the fixed time period, we could keep track of tx's reference height and compare with the latest sealed height, since each tx has a expire time and a reference block, if a tx has not been sealed after X blocks, then it will never be sealed. Check this out.

janezpodhostnik · 2024-12-11T16:10:06Z

tests/helpers.go

+		"0xee82856bf20e2aa6",
+		"0x0ae53cb6e3f42a79",


how did you get these addresses. Can we somehow express them differently?

Good point 👍
Updated in 04fd1d5

bootstrap/bootstrap.go

janezpodhostnik · 2024-12-11T16:50:09Z

services/requester/keystore.go

+	ks      *Keystore
+	Address flowsdk.Address
+	Signer  crypto.Signer
+	inuse   bool


You don't need the inuse part.

janezpodhostnik · 2024-12-11T17:12:14Z

services/requester/keystore.go

+	k.inuse = false
+}
+
+type Keystore struct {


The benchmark Account keys/key store are actually poorly designed and cause issues :).

What you want from the KeyStore is just a public Take (or Get) method. The key store should also have a reference to a block Publisher *models.Publisher[*models.Block].

The usage code should look something like this:

func someFuncCreatingAndSendingATX( keyStore KeyStore) { accountKey, err := keyStore.Take() // handle err defer accountKey.ReturnIfNotReserved() referenceBlock := getReferenceBlock() // ... tx, err := createAndSignTX(accountKey, referenceBlock.Height) if err != nil { // defer will handle cases where we haven't actually used the key return } txID, err := sendTX(tx) if err != nil { return } accountKey.ReserveFor(txID,referenceBlock.Height) }

the ReserveFor would do something like:

func (k *AccountKey) ReserveFor(txID, referenceHeight) { k.keystore.blockPublisher. // create new subscritpion that // if txID is found. increments account key sequence number // if referenceHeight is more than 1000 blocks past doesnt increment the sequence number // then unsubsribes and returns the key to the keystore }

ReturnIfNotReserved would just check if the key is reserved, if not it would just return it to the keystore.

The referenceBlock.Height refers to Flow block heights, while *models.Publisher[*models.Block] deals with EVM block heights. So I am not very fond of using this approach, given that we might experience failures from the system chunk tx. Also, we only have the txID for the submitted Flow transaction. We can't know if this will end up creating an EVM transaction.

I have updated the implementation in this commit: 74cb17d . Let me know what you think 🙏

My main concern with this is that if the sequence number de-syncs, its going to stay de-synced, as there is no correction mechanism. One fix to this is a more robust way to keep them synced, and the other is to detect de-syncs and fix them.

The sequence number could desync for a few reasons:

transaction failed in a way where the sequence number does not get incremented, and the gateway still incremented it

the gateway thinks the transaction was lost (and doesn't increment the sequence number), but it is just late (or somehow missed)

user manually used one of the keys for a transaction

... (maybe something I'm forgetting)

Using the time for detection is not guaranteed to work. As the chain might be slow or down and the transaction will succeed a bit later, reusing that key in the meantime will produce a failing transaction and a de-sync of the sequence number. With the height based approach we know that if the transaction hasn't gone through in X amount of blocks, its not going to go through, because the reference block is too old. Good point with the *models.Publisher[*models.Block] maybe we would need a publisher for flow blocks as well then.

as for the auto-correction I would only increment the sequence number after the results of the transactions are known. If the transaction resulted in a sequence number error I would re-fetch the sequence number for that key instead of incrementing it. If the transaction expired I just would not increment the sequence number.

Another (easier/shorter/temporary) solution for this problem would be to crash the whole gateway if a transaction results in a sequence number mismatch.

I've actually reworked the whole in-memory approach for SequenceNumber. See: 74cb17d#diff-c560dc90897114da6ccaa724718c1055144a1c195676fb1e047c0200899aecb9R35-R53 . Since the buildTransaction method already fetches the Flow account, I've used it to check the correct value for SequenceNumber, for a given key index.

I will also try to find a way for auto-expiring based on the reference block height 👍

@janezpodhostnik I added an approach for expiring account keys based on reference block height here: 374f70c . I guess it's not something that we have to do for every single Flow block. But let me know what you think.

nice, this will work. we can make it more optimal later if need be.

… keys left

…e used for

janezpodhostnik

nice!

m-Peter added Improvement Performance EVM labels Dec 3, 2024

m-Peter self-assigned this Dec 3, 2024

m-Peter changed the title ~~Enable submitted transaction validation using local state index~~ Enable validation of submitted transactions with local state index Dec 3, 2024

m-Peter force-pushed the mpeter/tx-validation-with-local-state branch 5 times, most recently from dae5dd3 to 7597ff0 Compare December 5, 2024 12:08

zhangchiqing reviewed Dec 5, 2024

View reviewed changes

m-Peter force-pushed the mpeter/tx-validation-with-local-state branch 2 times, most recently from 111f9c6 to 9c290ce Compare December 9, 2024 08:40

Enable transaction validation with local state index

26b732b

m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from 9c290ce to a06838e Compare December 10, 2024 09:27

m-Peter marked this pull request as ready for review December 10, 2024 09:28

m-Peter requested review from janezpodhostnik and peterargue as code owners December 10, 2024 09:28

m-Peter force-pushed the mpeter/tx-validation-with-local-state branch 2 times, most recently from 710de82 to b19cc5c Compare December 10, 2024 11:49

m-Peter commented Dec 10, 2024

View reviewed changes

bootstrap/bootstrap.go Show resolved Hide resolved

m-Peter commented Dec 10, 2024

View reviewed changes

m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from b19cc5c to 3b8ffc1 Compare December 10, 2024 12:00

m-Peter commented Dec 10, 2024

View reviewed changes

m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from 3b8ffc1 to db7398f Compare December 10, 2024 12:15

m-Peter commented Dec 10, 2024

View reviewed changes

m-Peter force-pushed the mpeter/tx-validation-with-local-state branch 2 times, most recently from 6083e6b to 52e1167 Compare December 10, 2024 13:43

Implement a Keystore for handling signing of multiple transactions co…

bda9fd5

…ncurrently

m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from 52e1167 to bda9fd5 Compare December 10, 2024 13:50

zhangchiqing approved these changes Dec 11, 2024

View reviewed changes

janezpodhostnik reviewed Dec 11, 2024

View reviewed changes

Remove hard-coded addresses for FungibleToken & FlowToken contracts

04fd1d5

m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from 219acef to 2a7482e Compare December 12, 2024 13:51

Add auto-expiration for locked account keys and metrics for available…

74cb17d

… keys left

m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from 2a7482e to 74cb17d Compare December 12, 2024 14:31

Auto-expire account keys based on the reference block height they wer…

374f70c

…e used for

janezpodhostnik approved these changes Dec 12, 2024

View reviewed changes

j1010001 merged commit bc5271d into feature/local-tx-reexecution Dec 12, 2024
2 checks passed

Enable validation of submitted transactions with local state index #693

Enable validation of submitted transactions with local state index #693

Conversation

m-Peter commented Dec 3, 2024 • edited Loading

Description

coderabbitai bot commented Dec 3, 2024 • edited Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

zhangchiqing left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-Peter Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-Peter Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangchiqing Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-Peter Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-Peter Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-Peter Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-Peter Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-Peter Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

zhangchiqing Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-Peter Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m-Peter Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janezpodhostnik left a comment

Choose a reason for hiding this comment

m-Peter commented Dec 3, 2024 •

edited

Loading

coderabbitai bot commented Dec 3, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

m-Peter Dec 10, 2024 •

edited

Loading

m-Peter Dec 12, 2024 •

edited

Loading

zhangchiqing Dec 11, 2024 •

edited

Loading

m-Peter Dec 12, 2024 •

edited

Loading

m-Peter Dec 12, 2024 •

edited

Loading

m-Peter Dec 10, 2024 •

edited

Loading

m-Peter Dec 10, 2024 •

edited

Loading

m-Peter Dec 12, 2024 •

edited

Loading

zhangchiqing Dec 11, 2024 •

edited

Loading

m-Peter Dec 12, 2024 •

edited

Loading

m-Peter Dec 12, 2024 •

edited

Loading