Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our proxy pod is a pet, but should be cattle (HA: Highly Available) #1364

Open
consideRatio opened this issue Aug 19, 2019 · 5 comments
Open

Comments

@consideRatio
Copy link
Member

In kubernetes, there is an analogy that a pod should be treated as cattle as compared to as a pet. The idea is that a pet isn't interchangeable, while cattle is.

Our hub pod communicates with the proxy-api kubernetes service, that will redirect traffic to one of the available proxy pods. But, when our hub pod does so, it actively configures that one pod while it in reality should configure all proxy pods.

Example issue scenario

Assume that there is not only a single proxy pod for some reason during a time interval. It could be that we want to have high availability (HA) and have made two be running at all time, or because we are making a helm chart upgrade that roll out a new proxy pod, or that the proxy pod crashed for some reason and a new started up.

The hub pod will speak with the proxy-api network service that will delegate traffic to one proxy pod, but not all. The hub will say to the proxy pod things like "Hey, when someone requests to access /user/erik they should go to 10.68.3.5!". The hub will also ask "Hey, what routes are you configured with already?", and if the hub concludes a route should be added or removed, it will speak up about that. But but but... The hub doesn't really know who it speaks with, it thinks it speaks with its single pet, but in reality it speaks with its cattle, and it does not try to make sure all cattle behave the same way but instead is focused on a single pet.

Goals

  • I'd like to see that a proxy that starts up, gets up to date directly somehow.
  • I'd like to see that our solution allows multiple proxy pods to be kept up to date.

Pain points

  • We let the proxy pod await configuration by the hub pod.
  • We communicate directly to a singular proxy pod to update it.

Related I think

#1226 - I think the proxy pod restarted, and the hub were clueless and didn't update the proxy's state. After this there may have been automatic updates of the proxy pod implemented, so the issue will go away after a while due to this but will still occur briefly.

@manics
Copy link
Member

manics commented Aug 19, 2019

Have you had any thoughts on how to implement this? If it's not a simple change it'd be worth designing in the option for the proxy to talk to multiple hubs even if that's not currently supported.

@consideRatio
Copy link
Member Author

I'm not sure, one option would be to use a k8s configmap to keep the state, but I figure this may be a low-end solution with various downsides. For example, I imagine that the propegation of updates to the configmap to be read by the container in a pod will be slow and things probably would get out of sync easily. I figure the typical high end solution is to use HA redis or similar, but that is probably a bit overkill. GitLab does this. But hmmm, is there a in between option?

Perhaps one step would be to allow for the usage of a existing redis deployment assuming custom configuration is provided. GitLab's helm chart allows you to get a lot of things by default, and then configure the use of external database, external nginx-ingress, external cert-manager, etc.

@betatim
Copy link
Member

betatim commented Aug 19, 2019

I think the thing to look at for this is using traefik as replacement for the configurable-http-proxy. @GeorgianaElena has been working on this over the summer. Not sure anyone has tried running the jupyterhub with traefik setup in kubernets yet.

The traefik based proxy is in this repository: https://github.com/jupyterhub/traefik-proxy

@manics
Copy link
Member

manics commented Aug 21, 2019

@minrk has a PR to add Traefik but it needs more work: #1162

@minrk
Copy link
Member

minrk commented Aug 22, 2019

#1162 needs to be updated to switch to consul from etcd (traefik is very slow with etcd and lots of routes, which @GeorgianaElena discoverd. But yes, #1162 completely solves this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants