Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable validation of submitted transactions with local state index #693

Merged

Conversation

m-Peter
Copy link
Collaborator

@m-Peter m-Peter commented Dec 3, 2024

Closes: #654
Closes: #586
Closes: #118

Description


For contributor use:

  • Targeted PR against master branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the standards mentioned here.
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

Copy link
Contributor

coderabbitai bot commented Dec 3, 2024

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@m-Peter m-Peter changed the title Enable submitted transaction validation using local state index Enable validation of submitted transactions with local state index Dec 3, 2024
@m-Peter m-Peter force-pushed the mpeter/tx-validation-with-local-state branch 5 times, most recently from dae5dd3 to 7597ff0 Compare December 5, 2024 12:08
Copy link
Member

@zhangchiqing zhangchiqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just had one question.

if res.Error != nil {
if err, ok := parseInvalidError(res.Error); ok {
return err
if t.config.TxStateValidation == config.TxSealValidation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is not here, but for L62 where we had

t.txPublisher.Publish(evmTx) // publish pending transaction event

Do we still need to mark the tx as sealed if TxStateValidation is local-index?

Copy link
Collaborator Author

@m-Peter m-Peter Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of what mechanism we use to validate submitted transactions, the t.txPublisher.Publish(evmTx) is used for this filter. It's a polling filter that developers can use to get notified when there's a pending transaction in the network. I have an issue for reworking the way we handle pending transactions in the codebase, so that we match the functionality from Geth: #544. But it will be in a future PR.

@m-Peter m-Peter force-pushed the mpeter/tx-validation-with-local-state branch 2 times, most recently from 111f9c6 to 9c290ce Compare December 9, 2024 08:40
@m-Peter m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from 9c290ce to a06838e Compare December 10, 2024 09:27
@m-Peter m-Peter marked this pull request as ready for review December 10, 2024 09:28
@m-Peter m-Peter force-pushed the mpeter/tx-validation-with-local-state branch 2 times, most recently from 710de82 to b19cc5c Compare December 10, 2024 11:49
err,
)
}
signer, err := createSigner(ctx, b.config, b.logger)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: After viewing this comment, I am not entirely sure if we should have one crypto.Signer object for each account key, or create one for each account key. Are there any thread-safe concerns here? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single signer cannot be used across go routines, so we need multiple. One for every account key.

Actually the crypto.PrivateKey is not thread safe either! So if multiple account keys have the same private key, you need to create multiple copies of crypto.PrivateKey.

Copy link
Collaborator Author

@m-Peter m-Peter Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we have 2 types of signers, one is the in-memory signer:

  1. crypto.NewInMemorySigner(config.COAKey, crypto.SHA3_256)
  2. The other is requester.NewKMSKeySigner, which doesn't deal at all with crypto.PrivateKey

Currently, only option 2. is supposed to be used in production. And I'm not sure if it has any thread-safety concerns, as it uses Cloud KMS for signing.

Option 1. is supposed to be used for local development / testing.

@m-Peter m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from b19cc5c to 3b8ffc1 Compare December 10, 2024 12:00
@@ -120,50 +119,24 @@ func parseConfigFromFlags() error {
return fmt.Errorf("invalid COA private key: %w", err)
}
cfg.COAKey = pkey
} else if keysPath != "" {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We no longer need the option to use multiple private keys using a JSON file, so this flag is removed entirely.

KeyID: keyParts[0],
KeyVersion: keyParts[1],
}
// key has the form "{keyID}@{keyVersion}"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: There's really no need to use one Cloud KMS key per Flow account key, and this might be actually costly to do so. Given that we might need ~1000 Flow account keys, we simply re-use one and the same Cloud KMS key for all of them.

@@ -169,6 +172,9 @@ func (r *RPCEventSubscriber) subscribe(ctx context.Context, height uint64) <-cha
continue
}
}
for _, evt := range blockEvents.Events {
r.keystore.UnlockKey(evt.TransactionID)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I am not a big fan of having to pass around the Keystore instance. I would rather have that instance take care the release of used keys. It could be as simple as releasing keys after a fixed time period (such as 15-20 seconds), as it doesn't take more time for a Flow transaction to seal.

Copy link
Member

@zhangchiqing zhangchiqing Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I am not a big fan of having to pass around the Keystore instance.

Agree. We could pass an interface with just LockKey/UnlockKey methods, so the event subscriber cannot use the key store to sign anything.

I would rather have that instance take care the release of used keys.

I think ideally we need both. It's not enough to check only the block events. A failed tx might not produce any event, but we should still UnlockKey. The UnlockKey here is more for the happy path, which can unlock the key ASAP for it to be reused.

If we rely on a fixed time period only, then we would have to prepare lots of keys in case GW needs to send lots of tx within that fixed time period, otherwise, we might run out of keys.

For the fixed time period, we could keep track of tx's reference height and compare with the latest sealed height, since each tx has a expire time and a reference block, if a tx has not been sealed after X blocks, then it will never be sealed. Check this out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't unlock be at transaction result ? like sign the transaction, send, check result, when result is sealed or executed ( I think executed should also work here ) then release the key.

Copy link
Collaborator Author

@m-Peter m-Peter Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangchiqing Added your suggested functionality in 74cb17d .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also noticed this https://www.flowscan.io/tx/7c615604c3a0caf638ed615b6292384516b7edcb4a87fbf448f2f74be3d7b392 in mainnet. Not sure if it's a race condition or something 🤔 .

Copy link
Collaborator Author

@m-Peter m-Peter Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't unlock be at transaction result ? like sign the transaction, send, check result, when result is sealed or executed ( I think executed should also work here ) then release the key.

@bluesign That's what we do for the happy path, when the Flow transaction is executed, and receive the EVM.TransactionExecuted event, then we release the key. But if, for whatever reason, the EVM transaction is invalid, then we won't get an EVM.TransactionExecuted event, to release the key for that Flow transaction. That's why we have a go-routine to do it on an interval. We could also try to fetch the Flow transaction result, but we might get rate-limited from the AN.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m-Peter thanks I didn't think about the rate limit.

"github.com/onflow/flow-go-sdk/crypto"
)

var ErrNoKeysAvailable = fmt.Errorf("no keys available")
Copy link
Collaborator Author

@m-Peter m-Peter Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m-Peter m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from 3b8ffc1 to db7398f Compare December 10, 2024 12:15
args ...cadence.Value,
) (*flow.Transaction, error) {
// building and signing transactions should be blocking, so we don't have keys conflict
e.mux.Lock()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think we could get rid of the lock acquisition here, as the key management is now inside the Keystore, and that is already blocking.

return err1
})
g.Go(func() error {
account, err2 = e.client.GetAccount(ctx, e.config.COAAddress)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Having to get the account for each call of eth_sendRawTransaction might be sub-optimal. The only reason we need this is because we used to have a metric for the operator's balance:

e.collector.OperatorBalance(account)

which is called below. Maybe it's better to move this metric somewhere else.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I've actually used that account variable, to fetch the proper SequenceNumber when using accKey.SetProposerPayerAndSign(flowTx, account).

@@ -5,7 +5,7 @@ module.exports = {
web3: web3,
eoa: web3.eth.accounts.privateKeyToAccount('0xf6d5333177711e562cabf1f311916196ee6ffc2a07966d9d4628094073bd5442'), // eoa is 0xfacf71692421039876a5bb4f10ef7a439d8ef61e
fundedAmount: 5.0,
startBlockHeight: 3n, // start block height after setup accounts
startBlockHeight: 4n, // start block height after setup accounts
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This one increased because when setting up integration tests, we also create a new Flow account, with multiple account keys, to be used from the Keystore. This also causes 1 more Flow block production, and if Flow block also creates an EVM block, which can be empty.

return err
}

// Ensure the transaction adheres to nonce ordering
Copy link
Collaborator Author

@m-Peter m-Peter Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We only check for nonce too low, to match the functionality from Geth:

This has the benefit that it allows users to sign and submit sequential transactions from the same EOA. It is the sender's responsibility to make sure the correct nonce was used.

In EVM Gateway, if we checked for nonce too high, then we produce false negatives, as the transactions might be in flight, and the local index has not yet caught up.

@m-Peter m-Peter force-pushed the mpeter/tx-validation-with-local-state branch 2 times, most recently from 6083e6b to 52e1167 Compare December 10, 2024 13:43
@m-Peter m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from 52e1167 to bda9fd5 Compare December 10, 2024 13:50
}

func (k *Keystore) LockKey(txID flowsdk.Identifier, key *AccountKey) {
k.usedKeys[txID] = key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We better have a metrics of used keys or remaining keys, useful for debugging when running out of keys.

Copy link
Collaborator Author

@m-Peter m-Peter Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a metric for the available signing keys in 74cb17d . I have added it as a gauge: 74cb17d#diff-c5928930f4834ea503f8bfcc587f8cae9148c4e4c099b4eb9a9578c3307fda63R95-R98, not sure if we want a counter instead. But since it's not a monotonically-increasing value, I chose to use a gauge.

@@ -169,6 +172,9 @@ func (r *RPCEventSubscriber) subscribe(ctx context.Context, height uint64) <-cha
continue
}
}
for _, evt := range blockEvents.Events {
r.keystore.UnlockKey(evt.TransactionID)
Copy link
Member

@zhangchiqing zhangchiqing Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I am not a big fan of having to pass around the Keystore instance.

Agree. We could pass an interface with just LockKey/UnlockKey methods, so the event subscriber cannot use the key store to sign anything.

I would rather have that instance take care the release of used keys.

I think ideally we need both. It's not enough to check only the block events. A failed tx might not produce any event, but we should still UnlockKey. The UnlockKey here is more for the happy path, which can unlock the key ASAP for it to be reused.

If we rely on a fixed time period only, then we would have to prepare lots of keys in case GW needs to send lots of tx within that fixed time period, otherwise, we might run out of keys.

For the fixed time period, we could keep track of tx's reference height and compare with the latest sealed height, since each tx has a expire time and a reference block, if a tx has not been sealed after X blocks, then it will never be sealed. Check this out.

tests/helpers.go Outdated
Comment on lines 150 to 151
"0xee82856bf20e2aa6",
"0x0ae53cb6e3f42a79",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you get these addresses. Can we somehow express them differently?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 👍
Updated in 04fd1d5

bootstrap/bootstrap.go Show resolved Hide resolved
ks *Keystore
Address flowsdk.Address
Signer crypto.Signer
inuse bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the inuse part.

k.inuse = false
}

type Keystore struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark Account keys/key store are actually poorly designed and cause issues :).

What you want from the KeyStore is just a public Take (or Get) method. The key store should also have a reference to a block Publisher *models.Publisher[*models.Block].

The usage code should look something like this:

func someFuncCreatingAndSendingATX( keyStore KeyStore) {
   accountKey, err := keyStore.Take()
   // handle err
   defer accountKey.ReturnIfNotReserved()

   referenceBlock := getReferenceBlock()
   
   // ...
   tx, err := createAndSignTX(accountKey, referenceBlock.Height)
   if err != nil {
     // defer will handle cases where we haven't actually used the key
     return
   }

   txID, err := sendTX(tx)
   if err != nil {
     return
   }
   accountKey.ReserveFor(txID,referenceBlock.Height)
 
}

the ReserveFor would do something like:

func (k *AccountKey) ReserveFor(txID, referenceHeight) {
  k.keystore.blockPublisher.
  // create new subscritpion that
  // if txID is found. increments account key sequence number
  // if referenceHeight is more than 1000 blocks past doesnt increment the sequence number
  // then unsubsribes and returns the key to the keystore
}

ReturnIfNotReserved would just check if the key is reserved, if not it would just return it to the keystore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenceBlock.Height refers to Flow block heights, while *models.Publisher[*models.Block] deals with EVM block heights. So I am not very fond of using this approach, given that we might experience failures from the system chunk tx. Also, we only have the txID for the submitted Flow transaction. We can't know if this will end up creating an EVM transaction.

Copy link
Collaborator Author

@m-Peter m-Peter Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the implementation in this commit: 74cb17d . Let me know what you think 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern with this is that if the sequence number de-syncs, its going to stay de-synced, as there is no correction mechanism. One fix to this is a more robust way to keep them synced, and the other is to detect de-syncs and fix them.

The sequence number could desync for a few reasons:

  • transaction failed in a way where the sequence number does not get incremented, and the gateway still incremented it
  • the gateway thinks the transaction was lost (and doesn't increment the sequence number), but it is just late (or somehow missed)
  • user manually used one of the keys for a transaction
  • ... (maybe something I'm forgetting)

Using the time for detection is not guaranteed to work. As the chain might be slow or down and the transaction will succeed a bit later, reusing that key in the meantime will produce a failing transaction and a de-sync of the sequence number. With the height based approach we know that if the transaction hasn't gone through in X amount of blocks, its not going to go through, because the reference block is too old. Good point with the *models.Publisher[*models.Block] maybe we would need a publisher for flow blocks as well then.

as for the auto-correction I would only increment the sequence number after the results of the transactions are known. If the transaction resulted in a sequence number error I would re-fetch the sequence number for that key instead of incrementing it. If the transaction expired I just would not increment the sequence number.

Another (easier/shorter/temporary) solution for this problem would be to crash the whole gateway if a transaction results in a sequence number mismatch.

Copy link
Collaborator Author

@m-Peter m-Peter Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually reworked the whole in-memory approach for SequenceNumber. See: 74cb17d#diff-c560dc90897114da6ccaa724718c1055144a1c195676fb1e047c0200899aecb9R35-R53 . Since the buildTransaction method already fetches the Flow account, I've used it to check the correct value for SequenceNumber, for a given key index.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also try to find a way for auto-expiring based on the reference block height 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janezpodhostnik I added an approach for expiring account keys based on reference block height here: 374f70c . I guess it's not something that we have to do for every single Flow block. But let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this will work. we can make it more optimal later if need be.

@m-Peter m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from 219acef to 2a7482e Compare December 12, 2024 13:51
@m-Peter m-Peter force-pushed the mpeter/tx-validation-with-local-state branch from 2a7482e to 74cb17d Compare December 12, 2024 14:31
Copy link
Contributor

@janezpodhostnik janezpodhostnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@j1010001 j1010001 merged commit bc5271d into feature/local-tx-reexecution Dec 12, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants