Our proxy pod is a pet, but should be cattle (HA: Highly Available) #1364

consideRatio · 2019-08-19T09:06:00Z

In kubernetes, there is an analogy that a pod should be treated as cattle as compared to as a pet. The idea is that a pet isn't interchangeable, while cattle is.

Our hub pod communicates with the proxy-api kubernetes service, that will redirect traffic to one of the available proxy pods. But, when our hub pod does so, it actively configures that one pod while it in reality should configure all proxy pods.

Example issue scenario

Assume that there is not only a single proxy pod for some reason during a time interval. It could be that we want to have high availability (HA) and have made two be running at all time, or because we are making a helm chart upgrade that roll out a new proxy pod, or that the proxy pod crashed for some reason and a new started up.

The hub pod will speak with the proxy-api network service that will delegate traffic to one proxy pod, but not all. The hub will say to the proxy pod things like "Hey, when someone requests to access /user/erik they should go to 10.68.3.5!". The hub will also ask "Hey, what routes are you configured with already?", and if the hub concludes a route should be added or removed, it will speak up about that. But but but... The hub doesn't really know who it speaks with, it thinks it speaks with its single pet, but in reality it speaks with its cattle, and it does not try to make sure all cattle behave the same way but instead is focused on a single pet.

Goals

I'd like to see that a proxy that starts up, gets up to date directly somehow.
I'd like to see that our solution allows multiple proxy pods to be kept up to date.

Pain points

We let the proxy pod await configuration by the hub pod.
We communicate directly to a singular proxy pod to update it.

Related I think

#1226 - I think the proxy pod restarted, and the hub were clueless and didn't update the proxy's state. After this there may have been automatic updates of the proxy pod implemented, so the issue will go away after a while due to this but will still occur briefly.

The text was updated successfully, but these errors were encountered:

manics · 2019-08-19T09:38:00Z

Have you had any thoughts on how to implement this? If it's not a simple change it'd be worth designing in the option for the proxy to talk to multiple hubs even if that's not currently supported.

consideRatio · 2019-08-19T11:27:43Z

I'm not sure, one option would be to use a k8s configmap to keep the state, but I figure this may be a low-end solution with various downsides. For example, I imagine that the propegation of updates to the configmap to be read by the container in a pod will be slow and things probably would get out of sync easily. I figure the typical high end solution is to use HA redis or similar, but that is probably a bit overkill. GitLab does this. But hmmm, is there a in between option?

Perhaps one step would be to allow for the usage of a existing redis deployment assuming custom configuration is provided. GitLab's helm chart allows you to get a lot of things by default, and then configure the use of external database, external nginx-ingress, external cert-manager, etc.

betatim · 2019-08-19T12:48:15Z

I think the thing to look at for this is using traefik as replacement for the configurable-http-proxy. @GeorgianaElena has been working on this over the summer. Not sure anyone has tried running the jupyterhub with traefik setup in kubernets yet.

The traefik based proxy is in this repository: https://github.com/jupyterhub/traefik-proxy

manics · 2019-08-21T11:50:42Z

@minrk has a PR to add Traefik but it needs more work: #1162

minrk · 2019-08-22T09:43:27Z

#1162 needs to be updated to switch to consul from etcd (traefik is very slow with etcd and lots of routes, which @GeorgianaElena discoverd. But yes, #1162 completely solves this issue.

minrk mentioned this issue Aug 22, 2019

[WIP] use traefik for the proxy #1162

Closed

9 tasks

manics mentioned this issue Dec 12, 2019

Support usage of JupyterHub's internal_ssl functionality #1520

Open

consideRatio changed the title ~~Our proxy pod is a pet, but should be cattle~~ Our proxy pod is a pet, but should be cattle (HA: Highly Available) Oct 6, 2020

consideRatio mentioned this issue Oct 6, 2020

Loss of state in chp container of proxy pod, forces us to restart the hub pod? #1226

Closed

consideRatio added enhancement architecture labels Oct 6, 2020

consideRatio mentioned this issue Apr 17, 2021

Why CHP replicas is 1 and non configurable? #2155

Closed

minrk mentioned this issue Sep 13, 2024

traefik proxy take 2 #3497

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Our proxy pod is a pet, but should be cattle (HA: Highly Available) #1364

Our proxy pod is a pet, but should be cattle (HA: Highly Available) #1364

consideRatio commented Aug 19, 2019

manics commented Aug 19, 2019

consideRatio commented Aug 19, 2019

betatim commented Aug 19, 2019

manics commented Aug 21, 2019

minrk commented Aug 22, 2019

Our proxy pod is a pet, but should be cattle (HA: Highly Available) #1364

Our proxy pod is a pet, but should be cattle (HA: Highly Available) #1364

Comments

consideRatio commented Aug 19, 2019

Example issue scenario

Goals

Pain points

Related I think

manics commented Aug 19, 2019

consideRatio commented Aug 19, 2019

betatim commented Aug 19, 2019

manics commented Aug 21, 2019

minrk commented Aug 22, 2019