-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract sequence tracking from the Broadcaster #12353
Conversation
I see you updated files related to core. Please run |
I see that you haven't updated any README files. Would it make sense to do so? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciation comment for cleaning this up from TXM 🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually does it make sense to move NonceTracker inside Broadcaster completely? It seems like Broadcaster already has all the necessary parameters and non else calls NonceTracker for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Moved the nonce tracker initialization into NewEvmBroadcaster
in the latest commit. Had to adjust how we initialize the Broadcaster in tests though to still have an exposed nonce tracker for tests.
return seq, err | ||
} | ||
|
||
func (s *nonceTracker) getSequenceFromStore(ctx context.Context, address common.Address) (seq evmtypes.Nonce, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why unpack FindLatestSequence
's logic here and not keep it as it is? GetAllTransactions
seems like a heavier query with no added benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prashantkumar1982 mentioned we're trying to move to a generic tx store so moving forward we'd only want to rely on generic methods. So this was just to break our reliance on an EVM specific method while I was already making these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good practice to load every tx from the DB and handle the logic in memory. This is significantly heavier than what we do now. If we decide moving to a generic tx store makes sense then we can make the change, but for now I would suggest to keep it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Dimitris here. Let's keep it the same. When we refactor the txstore, we can deal with this problem then. For now, let's just focus on the nonce and not change any behavior related to the txstore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. I appreciate that you removed more lines than you added :)
Just a few notes left in the comments - we will need to address these before merging.
return seq, err | ||
} | ||
|
||
func (s *nonceTracker) getSequenceFromStore(ctx context.Context, address common.Address) (seq evmtypes.Nonce, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Dimitris here. Let's keep it the same. When we refactor the txstore, we can deal with this problem then. For now, let's just focus on the nonce and not change any behavior related to the txstore.
@@ -1013,16 +1029,6 @@ func (o *evmTxStore) UpdateTxCallbackCompleted(ctx context.Context, pipelineTask | |||
return nil | |||
} | |||
|
|||
func (o *evmTxStore) FindLatestSequence(ctx context.Context, fromAddress common.Address, chainId *big.Int) (nonce evmtypes.Nonce, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep using this method and UpdateKeyNextSequence until we refactor the TxStore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to note UpdateKeyNextSequence
is unused. I removed it just as a clean up. It's actually updating a table that we don't own.
s.lggr.Infow("Fast-forward sequence", "address", addr, "newNextSequence", nonce, "oldNextSequence", localSequence) | ||
} | ||
|
||
s.sequenceLock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit dangerous to lock after doing a check that can affect whether we want to proceed with the write. In this case, what happens if the value of nonce
here changes after you do the check on 152 but before you lock here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check on line 152 is actually only used to determine if we should log some statements, otherwise we proceed with the write regardless. Also the nonce
here is just the one we retrieved from on chain. I guess it could change but the sequence map lock wouldn't prevent any desync with on-chain data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool this makes sense. So it's fine as is then!
client NonceTrackerClient | ||
enabledAddresses []common.Address | ||
|
||
sequenceLock sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, does the nonceTracker need a lock? If it's always been called from the broadcaster which has its own lock, then it might be protected already and adding the extra lock just slows down the code. But I'd need to look more carefully to confirm this. It might be safer just to have it unless it's crystal clear that the the calling code is protected by a broadcaster lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lock was actually moved over from the Broadcaster so we aren't double locking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, the Broadcaster can process multiple addresses at the same time, which are handled by different go routines (monitorTxs
) so there is a scenario where SequenceTracker requires locking. Good observation though, I like that we're pushing to clean up unnecessary things 🙌 .
2c68d46
to
0b22f2b
Compare
logger logger.Logger, | ||
checkerFactory TransmitCheckerFactory, | ||
autoSyncNonce bool, | ||
) *Broadcaster { | ||
return txmgr.NewBroadcaster(txStore, client, chainConfig, feeConfig, txConfig, listenerConfig, keystore, txAttemptBuilder, nonceSyncer, logger, checkerFactory, autoSyncNonce, evmtypes.GenerateNextNonce) | ||
nonceTracker := NewNonceTracker(logger, txStore, client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of creating the SequenceTracker inside the Broadcaster altogether but I guess the plan is to use it in other components as well so this is totally fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
// Try to retrieve next sequence from tx table or on-chain to load the map | ||
// A scenario could exist where loading the map during startup failed (e.g. All configured RPC's are unreachable at start) | ||
// The expectation is that the node does not fail startup so sequences need to be loaded during runtime | ||
foundSeq, err := s.getSequenceForAddr(ctx, address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes an RPC call onchain, while holding the lock. So we are locking this for potentially many seconds.
Could we acquire the lock only when we are reading/writing from the nextSequenceMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we run into an issue where multiple calls are made to GetNextSequence
where none find the value locally so they all search on-chain? Then if we only lock the actual write to the map, the callers could get different values from on-chain depending which RPC their request was sent to. They would then write conflicting values to the map and return different nonces? I think this wouldn't be a problem in the current setup where the Broadcaster only uses the nonce tracker but if other components start using this we could maybe run into this problem. My worries could also be completely unfounded though so not against making this change if this isn't a worry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I am assuming the broadcaster is the only caller, and for a given FromAddress, it only calls NonceTracker from a single thread.
But lets keep the logic as it is for now.
.changeset/silver-months-glow.md
Outdated
"chainlink": patch | ||
--- | ||
|
||
Fixed nonce gap bug if in-progress tx sent but TXM shutdown before marked as broadcasted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Fixed a race condition bug around EVM nonce management, which could cause the Node to skip a nonce and get stuck.
Quality Gate passedIssues Measures |
SequenceTracker
interface which will be used to manage sequences used in the BroadcasternonceTracker
component in the EVM code as the EVM specific implementation ofSequenceTracker
nonceTracker
FindLatestSequence
TxStore method for loading the sequences on startup. Switch to using the more genericGetAllTransactions
method.