-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration tests don't always pass (RFC4193 reverse ptr) #209
Comments
So this does happen. The problem is that DNS is being flooded in the tests, and sometimes the machine cannot keep up, and UDP kind of does the rest of the damage. I struggled with this, but I'm also on a 24 thread machine, and I doubt your mac is, so it makes it hard to gauge a threshold that will work for everyone. I don't really have a good answer for you other than "try again", they usually pass after a couple of tries, but again, on smaller core counts it might be more punishing. Also: you can |
hey thanks for replying. I don't think I've ever had a full run of I'm logging the whole reverse_authority_map now at the end of
and... I'm running OK. Got one. Too much output:
|
Might be getting closer: it seems like there are 2 items in reverse_authority_map, one has the ptrs, one doesn't. If the no ptr item is second in the list, the lookup fails. If they are in the other order, it works. |
I haven't looked that deep in a very long time, and I don't really have the time to right now. Maybe in a week or two. |
Would love to hear what you come up with though, and I'm happy to review any patches you file. |
Also I wanted to point out that the battery counts (the 0..1000) are there for a reason. UDP being a lossy protocol should fail naturally, but the service (and trust-dns's resolver) should retry failing connections. In practice (e.g., in the server you run yourself) this works fine, but in tests it gets overwhelmed, was my best theory. That said, there could always be a real bug in there. Some of it was slapped together very quickly. |
Looking forward to figuring it out. It's motivating me to explore the app. Thanks for the note about the battery tests. |
Cool. I'd love to hear about the solution when you find it, I'm curious now. :) |
Haven't been able to look at it much for a few days. But here's what i know In configure_members we iterate over the passed in reverse_authority_map
which is a hashmap and gives you items in non deterministic order. The order they pop out makes a difference: doesn't pass
pass
full address vs subnet order the ptr records always end up on the |
re issue #209 they would fail _sometimes_ so far so good with this change this change inserts fd23:3cca:ac27:8e85:c199:9300::/88 instead of fd23:3cca:ac27:8e85:c199:93e7:50c5:1646/88 into the reverse ptr authority actually, maybe to_ptr_soa_name should do this The real app init() does a similar thing, so we'll need to address that too. Not sure if this the correct fix yet.
re issue #209 they would fail _sometimes_ so far so good with this change this change inserts fd23:3cca:ac27:8e85:c199:9300::/88 instead of fd23:3cca:ac27:8e85:c199:93e7:50c5:1646/88 actually it would insert 1 of each before which was the issue only 1 would get the actual ptr records and then lookups would fail _sometimes_ into the reverse ptr authority actually, maybe to_ptr_soa_name should do this The real app init() does a similar thing, so we'll need to address that too. Not sure if this the correct fix yet.
re issue #209 they would fail _sometimes_ so far so good with this change this change inserts fd23:3cca:ac27:8e85:c199:9300::/88 instead of fd23:3cca:ac27:8e85:c199:93e7:50c5:1646/88 into the reverse ptr authority actually it would insert 1 of each before so there would be 2 SOA things, but only 1 would have the actual ptr records on it and then lookups would fail _sometimes_ actually, maybe to_ptr_soa_name should do this The real app init() does a similar thing, so we'll need to address that too. Not sure if this the correct fix yet.
re issue #209 they would fail _sometimes_ so far so good with this change this change inserts fd23:3cca:ac27:8e85:c199:9300::/88 instead of fd23:3cca:ac27:8e85:c199:93e7:50c5:1646/88 into the reverse ptr authority actually it would insert 1 of each before so there would be 2 SOA things, but only 1 would have the actual ptr records on it and then lookups would fail _sometimes_ actually, maybe to_ptr_soa_name should do this The real app init() does a similar thing, so we'll need to address that too. Not sure if this the correct fix yet.
re issue #209 they would fail _sometimes_ so far so good with this change this change inserts fd23:3cca:ac27:8e85:c199:9300::/88 instead of fd23:3cca:ac27:8e85:c199:93e7:50c5:1646/88 into the reverse ptr authority actually it would insert 1 of each before so there would be 2 SOA things, but only 1 would have the actual ptr records on it and then lookups would fail _sometimes_ actually, maybe to_ptr_soa_name should do this The real app init() does a similar thing, so we'll need to address that too. Not sure if this the correct fix yet.
re issue #209 they would fail _sometimes_ so far so good with this change this change inserts fd23:3cca:ac27:8e85:c199:9300::/88 instead of fd23:3cca:ac27:8e85:c199:93e7:50c5:1646/88 into the reverse ptr authority actually it would insert 1 of each before so there would be 2 SOA things, but only 1 would have the actual ptr records on it and then lookups would fail _sometimes_ The real app init() does a similar thing, so we need to address that too.
I've been trying to figure this out. Thought I'd post note in public instead of not.
I'm on a mac. Some small random subset of the tests will fail if I run all of them. I'm focusing now on the rfc4193::test_battery_single_domain_named. The reverse ptf will fail, sometimes. I think I've seen other parts fail, but if it does fail, 99% of the time is the reverse ptr.
I'm on a mac. (There's a mac bug that is fixed in
dev
that makes the tests not work at all)If I sprinkle some longer delays around, they get more reliable, but not 100%
How I've been starting the test, for example:
RUST_BACKTRACE=1 RUST_LOG=debug TOKEN=$(cat test-token.txt) sudo -E cargo test -- rfc4193::test_battery_single_domain_named --nocapture --test-threads 1 | grep "zeronsd\|trust\|integration"
Here are some tracing output. I can't tell why it works in some cases and not in others from the traces.
The text was updated successfully, but these errors were encountered: