-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable validation of submitted transactions with local state index #693
Enable validation of submitted transactions with local state index #693
Conversation
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
dae5dd3
to
7597ff0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just had one question.
if res.Error != nil { | ||
if err, ok := parseInvalidError(res.Error); ok { | ||
return err | ||
if t.config.TxStateValidation == config.TxSealValidation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is not here, but for L62 where we had
t.txPublisher.Publish(evmTx) // publish pending transaction event
Do we still need to mark the tx as sealed if TxStateValidation is local-index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of what mechanism we use to validate submitted transactions, the t.txPublisher.Publish(evmTx)
is used for this filter. It's a polling filter that developers can use to get notified when there's a pending transaction in the network. I have an issue for reworking the way we handle pending transactions in the codebase, so that we match the functionality from Geth: #544. But it will be in a future PR.
111f9c6
to
9c290ce
Compare
9c290ce
to
a06838e
Compare
710de82
to
b19cc5c
Compare
err, | ||
) | ||
} | ||
signer, err := createSigner(ctx, b.config, b.logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: After viewing this comment, I am not entirely sure if we should have one crypto.Signer
object for each account key, or create one for each account key. Are there any thread-safe concerns here? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single signer cannot be used across go routines, so we need multiple. One for every account key.
Actually the crypto.PrivateKey
is not thread safe either! So if multiple account keys have the same private key, you need to create multiple copies of crypto.PrivateKey
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we have 2 types of signers, one is the in-memory signer:
crypto.NewInMemorySigner(config.COAKey, crypto.SHA3_256)
- The other is
requester.NewKMSKeySigner
, which doesn't deal at all withcrypto.PrivateKey
Currently, only option 2. is supposed to be used in production. And I'm not sure if it has any thread-safety concerns, as it uses Cloud KMS for signing.
Option 1. is supposed to be used for local development / testing.
b19cc5c
to
3b8ffc1
Compare
@@ -120,50 +119,24 @@ func parseConfigFromFlags() error { | |||
return fmt.Errorf("invalid COA private key: %w", err) | |||
} | |||
cfg.COAKey = pkey | |||
} else if keysPath != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: We no longer need the option to use multiple private keys using a JSON file, so this flag is removed entirely.
KeyID: keyParts[0], | ||
KeyVersion: keyParts[1], | ||
} | ||
// key has the form "{keyID}@{keyVersion}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: There's really no need to use one Cloud KMS key per Flow account key, and this might be actually costly to do so. Given that we might need ~1000 Flow account keys, we simply re-use one and the same Cloud KMS key for all of them.
@@ -169,6 +172,9 @@ func (r *RPCEventSubscriber) subscribe(ctx context.Context, height uint64) <-cha | |||
continue | |||
} | |||
} | |||
for _, evt := range blockEvents.Events { | |||
r.keystore.UnlockKey(evt.TransactionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I am not a big fan of having to pass around the Keystore
instance. I would rather have that instance take care the release of used keys. It could be as simple as releasing keys after a fixed time period (such as 15-20 seconds), as it doesn't take more time for a Flow transaction to seal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I am not a big fan of having to pass around the Keystore instance.
Agree. We could pass an interface with just LockKey/UnlockKey
methods, so the event subscriber cannot use the key store to sign anything.
I would rather have that instance take care the release of used keys.
I think ideally we need both. It's not enough to check only the block events. A failed tx might not produce any event, but we should still UnlockKey. The UnlockKey here is more for the happy path, which can unlock the key ASAP for it to be reused.
If we rely on a fixed time period only, then we would have to prepare lots of keys in case GW needs to send lots of tx within that fixed time period, otherwise, we might run out of keys.
For the fixed time period, we could keep track of tx's reference height and compare with the latest sealed height, since each tx has a expire time and a reference block, if a tx has not been sealed after X blocks, then it will never be sealed. Check this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't unlock be at transaction result ? like sign the transaction, send, check result, when result is sealed or executed ( I think executed should also work here ) then release the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangchiqing Added your suggested functionality in 74cb17d .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also noticed this https://www.flowscan.io/tx/7c615604c3a0caf638ed615b6292384516b7edcb4a87fbf448f2f74be3d7b392 in mainnet. Not sure if it's a race condition or something 🤔 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't unlock be at transaction result ? like sign the transaction, send, check result, when result is sealed or executed ( I think executed should also work here ) then release the key.
@bluesign That's what we do for the happy path, when the Flow transaction is executed, and receive the EVM.TransactionExecuted
event, then we release the key. But if, for whatever reason, the EVM transaction is invalid, then we won't get an EVM.TransactionExecuted
event, to release the key for that Flow transaction. That's why we have a go-routine to do it on an interval. We could also try to fetch the Flow transaction result, but we might get rate-limited from the AN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m-Peter thanks I didn't think about the rate limit.
services/requester/keystore.go
Outdated
"github.com/onflow/flow-go-sdk/crypto" | ||
) | ||
|
||
var ErrNoKeysAvailable = fmt.Errorf("no keys available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Implementation taken and adapted from here: https://github.com/onflow/flow-go/blob/master/integration/benchmark/account/keys.go#L20-L110
3b8ffc1
to
db7398f
Compare
args ...cadence.Value, | ||
) (*flow.Transaction, error) { | ||
// building and signing transactions should be blocking, so we don't have keys conflict | ||
e.mux.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I think we could get rid of the lock acquisition here, as the key management is now inside the Keystore
, and that is already blocking.
return err1 | ||
}) | ||
g.Go(func() error { | ||
account, err2 = e.client.GetAccount(ctx, e.config.COAAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Having to get the account for each call of eth_sendRawTransaction
might be sub-optimal. The only reason we need this is because we used to have a metric for the operator's balance:
e.collector.OperatorBalance(account)
which is called below. Maybe it's better to move this metric somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I've actually used that account
variable, to fetch the proper SequenceNumber
when using accKey.SetProposerPayerAndSign(flowTx, account)
.
@@ -5,7 +5,7 @@ module.exports = { | |||
web3: web3, | |||
eoa: web3.eth.accounts.privateKeyToAccount('0xf6d5333177711e562cabf1f311916196ee6ffc2a07966d9d4628094073bd5442'), // eoa is 0xfacf71692421039876a5bb4f10ef7a439d8ef61e | |||
fundedAmount: 5.0, | |||
startBlockHeight: 3n, // start block height after setup accounts | |||
startBlockHeight: 4n, // start block height after setup accounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This one increased because when setting up integration tests, we also create a new Flow account, with multiple account keys, to be used from the Keystore
. This also causes 1 more Flow block production, and if Flow block also creates an EVM block, which can be empty.
return err | ||
} | ||
|
||
// Ensure the transaction adheres to nonce ordering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: We only check for nonce too low
, to match the functionality from Geth:
- https://github.com/onflow/go-ethereum/blob/master/core/txpool/validation.go#L209-L219
- https://github.com/onflow/go-ethereum/blob/master/core/txpool/legacypool/legacypool.go#L631
This has the benefit that it allows users to sign and submit sequential transactions from the same EOA. It is the sender's responsibility to make sure the correct nonce was used.
In EVM Gateway, if we checked for nonce too high
, then we produce false negatives, as the transactions might be in flight, and the local index has not yet caught up.
6083e6b
to
52e1167
Compare
52e1167
to
bda9fd5
Compare
services/requester/keystore.go
Outdated
} | ||
|
||
func (k *Keystore) LockKey(txID flowsdk.Identifier, key *AccountKey) { | ||
k.usedKeys[txID] = key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We better have a metrics of used keys or remaining keys, useful for debugging when running out of keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a metric for the available signing keys in 74cb17d . I have added it as a gauge: 74cb17d#diff-c5928930f4834ea503f8bfcc587f8cae9148c4e4c099b4eb9a9578c3307fda63R95-R98, not sure if we want a counter instead. But since it's not a monotonically-increasing value, I chose to use a gauge.
@@ -169,6 +172,9 @@ func (r *RPCEventSubscriber) subscribe(ctx context.Context, height uint64) <-cha | |||
continue | |||
} | |||
} | |||
for _, evt := range blockEvents.Events { | |||
r.keystore.UnlockKey(evt.TransactionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I am not a big fan of having to pass around the Keystore instance.
Agree. We could pass an interface with just LockKey/UnlockKey
methods, so the event subscriber cannot use the key store to sign anything.
I would rather have that instance take care the release of used keys.
I think ideally we need both. It's not enough to check only the block events. A failed tx might not produce any event, but we should still UnlockKey. The UnlockKey here is more for the happy path, which can unlock the key ASAP for it to be reused.
If we rely on a fixed time period only, then we would have to prepare lots of keys in case GW needs to send lots of tx within that fixed time period, otherwise, we might run out of keys.
For the fixed time period, we could keep track of tx's reference height and compare with the latest sealed height, since each tx has a expire time and a reference block, if a tx has not been sealed after X blocks, then it will never be sealed. Check this out.
tests/helpers.go
Outdated
"0xee82856bf20e2aa6", | ||
"0x0ae53cb6e3f42a79", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did you get these addresses. Can we somehow express them differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point 👍
Updated in 04fd1d5
services/requester/keystore.go
Outdated
ks *Keystore | ||
Address flowsdk.Address | ||
Signer crypto.Signer | ||
inuse bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the inuse
part.
services/requester/keystore.go
Outdated
k.inuse = false | ||
} | ||
|
||
type Keystore struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark Account keys/key store are actually poorly designed and cause issues :).
What you want from the KeyStore
is just a public Take
(or Get) method. The key store should also have a reference to a block Publisher *models.Publisher[*models.Block]
.
The usage code should look something like this:
func someFuncCreatingAndSendingATX( keyStore KeyStore) {
accountKey, err := keyStore.Take()
// handle err
defer accountKey.ReturnIfNotReserved()
referenceBlock := getReferenceBlock()
// ...
tx, err := createAndSignTX(accountKey, referenceBlock.Height)
if err != nil {
// defer will handle cases where we haven't actually used the key
return
}
txID, err := sendTX(tx)
if err != nil {
return
}
accountKey.ReserveFor(txID,referenceBlock.Height)
}
the ReserveFor
would do something like:
func (k *AccountKey) ReserveFor(txID, referenceHeight) {
k.keystore.blockPublisher.
// create new subscritpion that
// if txID is found. increments account key sequence number
// if referenceHeight is more than 1000 blocks past doesnt increment the sequence number
// then unsubsribes and returns the key to the keystore
}
ReturnIfNotReserved
would just check if the key is reserved, if not it would just return it to the keystore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The referenceBlock.Height
refers to Flow block heights, while *models.Publisher[*models.Block]
deals with EVM block heights. So I am not very fond of using this approach, given that we might experience failures from the system chunk tx. Also, we only have the txID
for the submitted Flow transaction. We can't know if this will end up creating an EVM transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the implementation in this commit: 74cb17d . Let me know what you think 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern with this is that if the sequence number de-syncs, its going to stay de-synced, as there is no correction mechanism. One fix to this is a more robust way to keep them synced, and the other is to detect de-syncs and fix them.
The sequence number could desync for a few reasons:
- transaction failed in a way where the sequence number does not get incremented, and the gateway still incremented it
- the gateway thinks the transaction was lost (and doesn't increment the sequence number), but it is just late (or somehow missed)
- user manually used one of the keys for a transaction
- ... (maybe something I'm forgetting)
Using the time for detection is not guaranteed to work. As the chain might be slow or down and the transaction will succeed a bit later, reusing that key in the meantime will produce a failing transaction and a de-sync of the sequence number. With the height based approach we know that if the transaction hasn't gone through in X amount of blocks, its not going to go through, because the reference block is too old. Good point with the *models.Publisher[*models.Block]
maybe we would need a publisher for flow blocks as well then.
as for the auto-correction I would only increment the sequence number after the results of the transactions are known. If the transaction resulted in a sequence number error I would re-fetch the sequence number for that key instead of incrementing it. If the transaction expired I just would not increment the sequence number.
Another (easier/shorter/temporary) solution for this problem would be to crash the whole gateway if a transaction results in a sequence number mismatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've actually reworked the whole in-memory approach for SequenceNumber
. See: 74cb17d#diff-c560dc90897114da6ccaa724718c1055144a1c195676fb1e047c0200899aecb9R35-R53 . Since the buildTransaction
method already fetches the Flow account, I've used it to check the correct value for SequenceNumber
, for a given key index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also try to find a way for auto-expiring based on the reference block height 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janezpodhostnik I added an approach for expiring account keys based on reference block height here: 374f70c . I guess it's not something that we have to do for every single Flow block. But let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, this will work. we can make it more optimal later if need be.
219acef
to
2a7482e
Compare
2a7482e
to
74cb17d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
Closes: #654
Closes: #586
Closes: #118
Description
For contributor use:
master
branchFiles changed
in the Github PR explorer