Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic reconnection to the Docker daemon #27

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

kgutwin
Copy link
Collaborator

@kgutwin kgutwin commented Jul 6, 2015

When connecting to Docker over TCP, there is the possibility of the events stream connection dropping while dnsdock continues to run. This PR adds a reconnect plus exponential backoff mechanism when handling stream errors.

Some notes:

  • Any stream error will trigger the reconnect; it could be possible to add sense logic that only triggers reconnect on certain stream errors.
  • Reconnection also involves reloading the full state from the Docker daemon in order to catch up with any lost events. This will overwrite any manual changes to service entries tied to containers.
  • I considered the possibility that this feature ought to be added to the samalba/dockerclient library instead; comments on the proper location of this feature are welcomed.

@tonistiigi
Copy link
Collaborator

I see the events monitor code has changed quite a lot in samalba/dockerclient . Maybe not the signatures but code looks a lot different. We should update the deps first(or with this PR) to test it with the new version.

I think it makes sense to ask if samalba/dockerclient would want something like this in the library instead(automatically or with a reconnect flag for example). There a already some ideas about refactoring that code: samalba/dockerclient#72

Overwriting manual changes scared me a bit. I think we could add overwrite flags(or a mask) that are set to indicate that user has taken over the control. On the other hand the use case for this is very limited, only when your dnsdock container runs in a different daemon than the one it manages. If its the same, dnsdock would be restarted with the daemon and these changes would be lost anyway.

I don't think the current code also handles containers that have been stopped with the daemon restart(quite a common use case actually).

@kgutwin
Copy link
Collaborator Author

kgutwin commented Jul 6, 2015

Added some commits for the samalba/dockerclient dep update. I don't know Travis CI well enough to be sure this is exactly how to do it, but at least the green check mark has reappeared 😄

Honestly, I keep going back and forth about whether reconnect ought to be handled in samalba/dockerclient or handled by dnsdock. You might argue that dnsdock ought to handle the reconnect as-is - the disconnect is passed as an error, which the client captures and handles according to the method that suits it best.

I think the worst part about 'overwriting manual changes' is the uncontrolled nature of it - the user would not expect a network outage to cause them to lose their manual service tweaks. We may not want the overwrite-protection mechanism to block the ability of legitimate Docker events to update services (as it would in the current code). Unfortunately, it's very difficult or perhaps impossible to rigorously distinguish between those two situations, because anything could happen during the time window between disconnection and reconnection - and because it can take some time before a network socket is force closed, it's impossible to determine precisely when the disconnection event happened.

One way I can imagine resolving this would be to only push an update from the Docker side if we determine the Docker metadata has changed - and we could do this by storing a docker_metadata_hash value along with the service. When doing a catchup update after a reconnect, we compute this hash value from the latest metadata, and if it's the same, we don't push the update (and therefore don't overwrite any manual changes).

@tonistiigi
Copy link
Collaborator

My thinking would be that once I overwrite a value manually I have taken over the control and don't want any automatic feature ever changing it again(even rename for example).

Btw, I'm not aware of anyone using the PATCH method for anything else than ttl. I think it would be quite safe to remove this feature and only allow ttl change. I could release v2.0 for that to be sure we don't break anyone. Changing other values is quite a bad habit anyway because its not persistent.

@kgutwin
Copy link
Collaborator Author

kgutwin commented Jul 6, 2015

OK, that's easier to implement. I'll add the 'manual overwrite protection' feature:

  • The PUT and PATCH methods on /services/{id} will set the flag
  • Any updates from the Docker side (stream or catchup) on existing services with the flag set will be ignored
  • The /reload HTTP method from HTTP endpoint to trigger Docker service sync #26 will remove the flag

You can keep or remove PATCH as you wish. I don't see it as a big problem.

.travis.yml Outdated
@@ -4,5 +4,6 @@ go:
- 1.4

before_script:
- go get -d -v ./...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you need this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, I'm still learning the whole Go toolchain and was confused by the Travis CI build failure (https://travis-ci.org/tonistiigi/dnsdock/builds/69731233). It looked like Travis didn't have all the deps and I found this line in an early commit, so I gave it a shot.

Later, I happened to notice that there were new files under Godeps/_workspace/src/github.com/docker; I think perhaps I should have instead added those to the 'Update samalba/dockerclient dep' commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. I can squash/fix while reviewing. All deps should be included in Godeps/ so this should not be needed. Dockerclient has new dep docker/pkg/units that also needs to be included.

@tonistiigi tonistiigi self-assigned this Jul 7, 2015
@tonistiigi
Copy link
Collaborator

I've updated all the dependencies to latest version in the repo. You can now rebase this PR (and remove the travis command) to keep the godeps changes out from here. Feel free start from clean PR if thats easier for you.

dnsserver.go Outdated
}

func NewService() (s *Service) {
s = &Service{Ttl: -1}
s = &Service{Ttl: -1, Manual: false}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant.

@kgutwin
Copy link
Collaborator Author

kgutwin commented Jul 10, 2015

I think this rebase is ready for review again.

@tonistiigi
Copy link
Collaborator

@kgutwin Sorry it's taking so long. I want to make some changes and update dockerclient again with latest events fix. Haven't forgotten this.

@bersace
Copy link

bersace commented Sep 25, 2017

@kgutwin there is conflict to solve now :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants