-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix_: retry dnsdisc on failure #5785
Conversation
Jenkins BuildsClick to see older builds (20)
|
Thanks so much! 😍 |
edc67c5
to
fe734e3
Compare
wakuv2/waku.go
Outdated
dnsDiscAsyncRetrieved = true | ||
t.Reset(3 * time.Second) | ||
case <-t.C: | ||
if dnsDiscAsyncRetrieved { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why not execute this block of code immediately when get the signal dnsDiscAsyncRetrievedSignal
? Even call restartDiscV5
directly, the channels here feels complex.
If there is some race condition, we can probably add a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this logic because i thought that executing discv5 in fast succession is not ideal (this could happen only assuming that you have more than one enrtree registered that failed to be resolved). But I agree that it adds complexity. I'll remove the need for a ticker/select
wakuv2/waku.go
Outdated
case <-ticker.C: | ||
if w.seededBootnodesForDiscV5 && len(w.node.Host().Network().Peers()) > 3 { | ||
w.logger.Debug("not querying bootnodes", zap.Bool("seeded", w.seededBootnodesForDiscV5), zap.Int("peer-count", len(w.node.Host().Network().Peers()))) | ||
continue | ||
} | ||
if canQuery() { | ||
if w.cfg.EnableDiscV5 && canQuery() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to move the check to restartDiscV5, or maybe a new method?
fe734e3
to
59de449
Compare
wakuv2/waku.go
Outdated
@@ -1652,14 +1680,24 @@ func (w *Waku) seedBootnodesForDiscV5() { | |||
|
|||
for { | |||
select { | |||
case <-w.dnsDiscAsyncRetrievedSignal: | |||
if canQuery() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case of lightclient mode since discv5 won't be enabled, wondering if this will cause an issue as we are trying restart discv5.
wakuv2/waku.go
Outdated
w.logger.Warn("failed to restart discv5", zap.Error(err)) | ||
} | ||
|
||
if canQuery() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above, we may need to have EnableDiscv5 check to prevent restarting in case of lightclient.
59de449
to
5ba0054
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
retries++ | ||
backoff := time.Second * time.Duration(math.Exp2(float64(retries))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about setting an upper bound to the backoff?
@@ -455,6 +461,32 @@ func (w *Waku) dnsDiscover(ctx context.Context, enrtreeAddress string, apply fnA | |||
return nil | |||
} | |||
|
|||
func (w *Waku) retryDnsDiscoveryWithBackoff(ctx context.Context, addr string, successChan chan<- struct{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure, but it seems that in status-go it's preferred to use backoff.ExponentialBackOff
from github.com/cenkalti/backoff/v3
rather than implementing it manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be used only in tests. I'll do a refactoring of how backoffs are used in waku.go in a separate PR since i saw another instance of manual backoff period being done there
wakuv2/waku.go
Outdated
w.seededBootnodesForDiscV5 = false | ||
mu.Unlock() | ||
if err := w.dnsDiscover(ctx, addr, retrieveENR, useOnlyDnsDiscCache); err != nil { | ||
go w.retryDnsDiscoveryWithBackoff(ctx, addr, w.dnsDiscAsyncRetrievedSignal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a wg.Add(1)
is needed here. Otherwise might panic on stopping the node, e.g. accessing w.ctx
.
wakuv2/waku.go
Outdated
@@ -1652,14 +1684,24 @@ func (w *Waku) seedBootnodesForDiscV5() { | |||
|
|||
for { | |||
select { | |||
case <-w.dnsDiscAsyncRetrievedSignal: | |||
if canQuery() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better like this? Just less spaghetti, nothing more
if !canQuery() {
break
}
// logic here
5ba0054
to
4d0163d
Compare
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## develop #5785 +/- ##
==========================================
Coverage ? 46.09%
==========================================
Files ? 891
Lines ? 158113
Branches ? 0
==========================================
Hits ? 72882
Misses ? 76872
Partials ? 8359
Flags with carried forward coverage won't be shown. Click here to find out more.
|
4d0163d
to
292309c
Compare
This adds a retry mechanism for DNS discovery for those cases in which it fails to resolve.
cc: @gabrielmer