-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HomeAssistant CoreDNS timeout settings #3
Comments
The idea of having to rebuild coredns to change this is insane. However, it is doable, and led me onto a much wider thought. Actually, if push comes to shove, we've a fair bit of flexibility in terms of what's running on the system despite not having access to the underlying OS (that's achievable too, I've already popped the host from an add-on container, it's just not something I'm gonna productise). We have access to the docker daemon, which means we can build and tag images to our heart's content. So rather than trying to patch things via
Then, when supervisor restarts the add-on, we'll be running our modified version. Obviously, after any update the process would need to be re-run. It also means that an alternative fix for #1 would be to build an entirely new supervisor container with auto-update patched out and calls to CodeNotary in That's very much a nuclear option though, we could just as easily push across a rewrite to that file so that exceptions Having dug into it, it really is the most pointless "protection" - the code is checking it's own homework, so anyone with the ability to patch the code being protected also has the ability to patch out the checks. It's effectively just security theatre that impacts users most. |
Gentlemen (@fenichelar - @Zixim): probably there's a solution to our neverending battle for a proper HA DNS implementation: https://github.com/bentasker/HomeAssistantAddons/tree/master/core-dns-override @bentasker you have no idea how long we have been waiting for an auto-patcher, we've been patching that configuration manually every time. They didn't even approve a slight modification like this one, let alone all the ones (#56 to #61) that @fenichelar tried to push. It's incredible they're not even able to acknowledge they have a huge DNS problem. I'm going to install your addon immediately and I hope it solves the problem. Thank you so much. :) |
@alexdelprete Glad I could help. Let me know if you encounter any issues with it - I've tried to be relatively conservative in terms of what it does, but if that causes an issue then there's another more heavy-handed approach I can take (which does come with the benefit that it won't revert when the container is restarted).
I've done what I can to feed a detailed analysis of the issue and underlying cause back upstream (https://github.com/home-assistant/plugin-dns/issues/ 64) but based on how lightly other PRs/Issues have been discarded, I'm not going to hold my breath. The idea that they've ignored an issue of this scale for this long is more than a bit bewildering to be honest. I also have supervisor auto-updates on my radar. #1 implements an initial blocker to them, but it means that the Supervisor "Add on store" doesn't display after a reboot (it blocks trying to pull information by the looks of it). There are a couple of ways I can go about fixing that, I've not quite settled on which I prefer yet. |
I read your comments about that, I think it's a little bit too much. :) I hope this smarter approach works fine, will let you know...thanks a lot.
I doubt they will do anything. In our last attempt, I even engaged the big boss, and he said they were not even willing to discuss it anymore. Case closed. I won't even bother to comment anymore, they wanted to suspend me because I was simply showing them how naive their approach was vs the problem: a big elephant in the room, and they don't see it. |
I started work on a PR to create a config option that allowed completely disabling Cloudflare DOT, however, it became pretty clear that I was not going to be able to get this merged so I gave up. FYI, I was able to solve all of my problems by NATing 1.1.1.1 and 1.0.0.1 to my private DNS resolver. The NAT is setup for both TCP/UDP port 53 and TCP port 853. TLS certificate verification fails because my DOT server obviously does not have a valid TLS certificate for |
I think it was one of your PRs I first saw actually - lots of work had gone into it, but then the devs came back with a bunch of requirements no-one sane could accept: "do all this work, and ensure we mark anyone using it as unsupported"
That makes sense - the storms happen because of CoreDNS's handling at the TCP level - if the TCP handshake completes then you avoid that (even if SSL verification then fails). The funny thing is that your PR to move it back to UDP would have fixed those (even without redirection to another server).
Yep, I've been quite struck by the difference to be honest. I've a few devices ewelink and Tuya devices, and I'd always assumed they were slow/laggy because distant API and didn't bother troubleshooting. Turns out the issue was very much closer to home.
Agreed, @alexdelprete - the main reason I've been experimenting with it is to hedge against ending up being marked as unsupported/unhealthy. Whilst I've no desire to seek support on something I've customised, I do want to still be able to update addons etc. |
been using the auto-patcher since it was created. It's the bees knees. |
and you didn't tell us...:) |
I perfectly understand, it's the plan B in case they ban your addon or enforce something that neutralizes your fix. You prepared the nuke. I don't think they will bother, they perfectly know there are people like us that in some way will find workarounds. :) I'm still puzzled by the choice of using CoreDNS vs unbound or dnsmasq, that are much more reliable and widely used...because I can clearly see there are two issues here: a bad configuration + a bad dns resolver/forwarder. One question about the addon, I'm just installing it: your addon checks every 60s to enforce correct configuration, but that doesn't imply a restart every 60s, right? Because that could raise some issues. Restart needed only if changes are applied, if I understood correctly. That is also if |
I can confirm the addon is working perfectly. Great job. :) |
Exactly correct - it'll only restart
CoreDNS is actually pretty good where it's commonly used - in Kubernetes clusters with decent connectivity - it's a poor choice (IMO) for embedding in an appliance. As you say, Unbound and dnsmasq are much better candidates. |
unbound/dnsmasq work ok in any condition I've had the possibility to see...but I'm no expert of this new containerized world...old-school guy here. :) thanks a lot for everything, I'm really grateful for your contribution to this problem. can I contact you privately for an advice on an issue I'm having for another project? let me know what's your preferred way...thanks in advance. |
Yeah, Unbound/DNSMasq are pretty mature and resilient. There's almost an inherent assumption in coreDNS that the person configuring it has reasonable control over the network - that contract's effectively broken in HomeAssistant because the config was dropped in without the network operator's knowledge/permission.
Sure, do you wanna DM me on the HA community forum and then I'll give you a better route of contact (saves me dropping details into a public GH issue) |
Sure Ben, I'll contact you on the forum first. Thanks a lot. |
As detailed here some users might still experience delays/lag if their external connectivity drops.
It'll depend on what the upstream timeouts on their local DNS is - obviously if they're using a router, then it may not be configurable (and the router might be a little too dumb to know that it's upstream link is down).
Back within the sphere of HomeAssistant, I was wondering if we can tweak things there so mitigate for users affected by that.
Unfortunately, CoreDNS uses a DialTimeout of 30s (here).
Actually, worse than that, it dynamically adjusts, which probably goes a long way to explain the packet storms that HomeAssistant users have reported when Cloudflare was unreachable.
Unfortunately, timeout isn't a configurable - we'd need to make changes in the source and then recompile
coredns
with those. Far from ideal.The text was updated successfully, but these errors were encountered: