-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failover that retries rpc #346
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating. Looks good overall!
There is additional Note, I will see if it make senses to add timeout at multihoming client level. Depending on if the caller does not add timeout already |
common/geth/failover.go
Outdated
rpcFault := f.handleError(err) | ||
|
||
if rpcFault { | ||
f.NumberRpcFault += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be "ServerFault", which aligns better with gRPC/HTTP error code definition?
common/geth/handle_error.go
Outdated
) | ||
|
||
// handleHttpError returns a boolean indicating if error atrributes to remote RPC | ||
func (f *FailoverController) handleHttpError(httpRespError rpc.HTTPError) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this assume the service provider expose gRPC interface, but using HTTP error code?
It looks quite mixed whether it's RPC or HTTP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
func (m *MultiHomingClient) SuggestGasTipCap(ctx context.Context) (*big.Int, error) { | ||
var errLast error | ||
for i := 0; i < m.NumRetries+1; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code is repeating in each function, can it be pulled out and shared? It looks just need to wrap the instance.SomeFunc(SomeArgs)
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think ethclient from geth support grpc, only jsonRpc. https://geth.ethereum.org/docs/interacting-with-geth/rpc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great
sorry there's a lot of conflicts from my other PR 😭
return NewRPC, Retry | ||
} | ||
|
||
// handleError returns a boolean indicating if the current connection should be rotated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated with the new return type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still to be done
9148ba5
to
a1fa817
Compare
all conflict fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
common/geth/failover.go
Outdated
) | ||
|
||
type RPCStatistics struct { | ||
numberRpcFault uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: group private members and public members separately
{
mu *sync.RWMutex
numberRpcFault uint64
Logger logging.Logger
}
return NewRPC, Retry | ||
} | ||
|
||
// handleError returns a boolean indicating if the current connection should be rotated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still to be done
common/geth/handle_error.go
Outdated
if errors.As(err, &httpRespError) { | ||
// if error is http error, i.e. non 2xx error, it is handled here | ||
// if it is 2xx error, the error message is nil, https://github.com/ethereum/go-ethereum/blob/master/rpc/http.go, | ||
// execution does not entere here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entere -> enter
common/geth/failover.go
Outdated
"github.com/Layr-Labs/eigensdk-go/logging" | ||
) | ||
|
||
type RPCStatistics struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why you change to this name, this does more than statistics like counting num errors, but actually handle and make failover decisions. The old name may fit better.
Why are these changes needed?
This PR replaces a previous PR #333 by a new implementation for RPC retry logic.
The retry logic is only implemented for batcher, as operator uses instrumentedClient. It is possible to enable Multi-homing on operators too, but some metrics definitions need further clearation. As the result, it is only included in the PR.
In the new Retry logic, a multi-homing client takes both read/write rpccall to Ethereum. It is a wrapper to geth.EthClient. The alternative way would be using goethereum.Client, but it would require copying our functions that implement our custom interface.
The multi-homing client uses numRetries (configurable by argument) to decide how many more to retry once failed in the beginning.
Checks