Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix_: retry dnsdisc on failure #5785

Merged
merged 2 commits into from
Oct 7, 2024
Merged

fix_: retry dnsdisc on failure #5785

merged 2 commits into from
Oct 7, 2024

Conversation

richard-ramos
Copy link
Member

@richard-ramos richard-ramos commented Aug 29, 2024

This adds a retry mechanism for DNS discovery for those cases in which it fails to resolve.

cc: @gabrielmer

@status-im-auto
Copy link
Member

status-im-auto commented Aug 29, 2024

Jenkins Builds

Click to see older builds (20)
Commit #️⃣ Finished (UTC) Duration Platform Result
✖️ edc67c5 #1 2024-08-29 18:20:35 ~1 min tests 📄log
✔️ edc67c5 #1 2024-08-29 18:21:47 ~2 min tests-rpc 📄log
✔️ edc67c5 #1 2024-08-29 18:23:27 ~4 min linux 📦zip
✔️ edc67c5 #1 2024-08-29 18:23:35 ~4 min ios 📦zip
✔️ edc67c5 #1 2024-08-29 18:24:52 ~5 min android 📦aar
✔️ fe734e3 #2 2024-08-29 19:14:49 ~1 min android 📦aar
✔️ fe734e3 #2 2024-08-29 19:15:22 ~1 min linux 📦zip
✔️ fe734e3 #2 2024-08-29 19:15:53 ~2 min tests-rpc 📄log
✔️ fe734e3 #2 2024-08-29 19:16:33 ~3 min ios 📦zip
✔️ fe734e3 #2 2024-08-29 19:46:01 ~32 min tests 📄log
✔️ 59de449 #3 2024-08-30 12:32:34 ~2 min tests-rpc 📄log
✔️ 59de449 #3 2024-08-30 12:33:58 ~3 min linux 📦zip
✔️ 59de449 #3 2024-08-30 12:34:08 ~3 min ios 📦zip
✔️ 59de449 #3 2024-08-30 12:35:43 ~5 min android 📦aar
✔️ 59de449 #3 2024-08-30 13:03:10 ~32 min tests 📄log
✔️ 5ba0054 #4 2024-09-04 23:18:59 ~2 min tests-rpc 📄log
✔️ 5ba0054 #4 2024-09-04 23:20:32 ~3 min linux 📦zip
✔️ 5ba0054 #4 2024-09-04 23:21:06 ~4 min android 📦aar
✔️ 5ba0054 #4 2024-09-04 23:21:18 ~4 min ios 📦zip
✔️ 5ba0054 #4 2024-09-04 23:49:26 ~32 min tests 📄log
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 4d0163d #5 2024-09-25 22:17:49 ~2 min tests-rpc 📄log
✔️ 4d0163d #5 2024-09-25 22:19:11 ~4 min linux 📦zip
✔️ 4d0163d #5 2024-09-25 22:19:51 ~4 min android 📦aar
✔️ 4d0163d #5 2024-09-25 22:20:52 ~5 min ios 📦zip
✖️ 4d0163d #5 2024-09-25 22:47:41 ~32 min tests 📄log
✔️ 292309c #6 2024-10-03 15:59:06 ~3 min tests-rpc 📄log
✔️ 292309c #6 2024-10-03 15:59:17 ~3 min ios 📦zip
✔️ 292309c #6 2024-10-03 15:59:57 ~3 min linux 📦zip
✔️ 292309c #6 2024-10-03 16:00:08 ~4 min android 📦aar
✔️ 292309c #6 2024-10-03 16:28:21 ~32 min tests 📄log

@gabrielmer
Copy link

Thanks so much! 😍

wakuv2/waku.go Outdated
dnsDiscAsyncRetrieved = true
t.Reset(3 * time.Second)
case <-t.C:
if dnsDiscAsyncRetrieved {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why not execute this block of code immediately when get the signal dnsDiscAsyncRetrievedSignal? Even call restartDiscV5 directly, the channels here feels complex.
If there is some race condition, we can probably add a lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this logic because i thought that executing discv5 in fast succession is not ideal (this could happen only assuming that you have more than one enrtree registered that failed to be resolved). But I agree that it adds complexity. I'll remove the need for a ticker/select

wakuv2/waku.go Outdated
case <-ticker.C:
if w.seededBootnodesForDiscV5 && len(w.node.Host().Network().Peers()) > 3 {
w.logger.Debug("not querying bootnodes", zap.Bool("seeded", w.seededBootnodesForDiscV5), zap.Int("peer-count", len(w.node.Host().Network().Peers())))
continue
}
if canQuery() {
if w.cfg.EnableDiscV5 && canQuery() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move the check to restartDiscV5, or maybe a new method?

wakuv2/waku.go Outdated
@@ -1652,14 +1680,24 @@ func (w *Waku) seedBootnodesForDiscV5() {

for {
select {
case <-w.dnsDiscAsyncRetrievedSignal:
if canQuery() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of lightclient mode since discv5 won't be enabled, wondering if this will cause an issue as we are trying restart discv5.

wakuv2/waku.go Outdated
w.logger.Warn("failed to restart discv5", zap.Error(err))
}

if canQuery() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, we may need to have EnableDiscv5 check to prevent restarting in case of lightclient.

Copy link
Member

@jrainville jrainville left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

retries++
backoff := time.Second * time.Duration(math.Exp2(float64(retries)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about setting an upper bound to the backoff?

@@ -455,6 +461,32 @@ func (w *Waku) dnsDiscover(ctx context.Context, enrtreeAddress string, apply fnA
return nil
}

func (w *Waku) retryDnsDiscoveryWithBackoff(ctx context.Context, addr string, successChan chan<- struct{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure, but it seems that in status-go it's preferred to use backoff.ExponentialBackOff from github.com/cenkalti/backoff/v3 rather than implementing it manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be used only in tests. I'll do a refactoring of how backoffs are used in waku.go in a separate PR since i saw another instance of manual backoff period being done there

wakuv2/waku.go Outdated
w.seededBootnodesForDiscV5 = false
mu.Unlock()
if err := w.dnsDiscover(ctx, addr, retrieveENR, useOnlyDnsDiscCache); err != nil {
go w.retryDnsDiscoveryWithBackoff(ctx, addr, w.dnsDiscAsyncRetrievedSignal)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a wg.Add(1) is needed here. Otherwise might panic on stopping the node, e.g. accessing w.ctx.

wakuv2/waku.go Outdated
@@ -1652,14 +1684,24 @@ func (w *Waku) seedBootnodesForDiscV5() {

for {
select {
case <-w.dnsDiscAsyncRetrievedSignal:
if canQuery() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better like this? Just less spaghetti, nothing more

if !canQuery() {
    break
}
// logic here

@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2024

Codecov Report

Attention: Patch coverage is 66.12903% with 21 lines in your changes missing coverage. Please review.

Please upload report for BASE (develop@5a0e06f). Learn more about missing BASE report.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
wakuv2/waku.go 66.12% 15 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5785   +/-   ##
==========================================
  Coverage           ?   46.09%           
==========================================
  Files              ?      891           
  Lines              ?   158113           
  Branches           ?        0           
==========================================
  Hits               ?    72882           
  Misses             ?    76872           
  Partials           ?     8359           
Flag Coverage Δ
functional 11.85% <22.58%> (?)
unit 45.49% <66.12%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
wakuv2/waku.go 69.97% <66.12%> (ø)

@richard-ramos richard-ramos changed the title fix(waku2): retry dnsdisc on failure fix_: retry dnsdisc on failure Oct 3, 2024
@richard-ramos richard-ramos merged commit 94ff99d into develop Oct 7, 2024
12 of 13 checks passed
@richard-ramos richard-ramos deleted the dns-disc-retry branch October 7, 2024 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.