-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues Upgrading from JHub 0.9 --> 10.2 #884
Comments
Hmm did something change with DNS / static-IPs? I notice that going to https://staging.us-central1-b.gcp.pangeo.io fails, but |
In the logs for the proxy pod:
Perhaps we also disable the networkPolicy for the proxy pod @scottyhq? Or we need to add labels to it / the hub so they can talk to each other? Hopefully @consideRatio knows what's best here. |
Concerns raised
Not to worry I think, all config is put into
Not to worry, a detail to allow for also IPv6 in this part.
Not to worry, a security patch no longer needed.
Not to worry, it is about active users I think. Complexity reduction
Diagnosis
Feedback on #885Explicitly setting daskhub.jupyterhub.proxy.https.enabled=true is required, so if it wasn't before, that is perhaps the issue as it seems you have |
I did deploy this change locally to staging, but it didn't fix the issue
Can try this next
On Chrome I see 'establishing secure connection...' and then eventually 'ERR_TIMED_OUT'. On Safari things hand longer and eventually the 'can't establish secure connection message'
No
Correct, we have those 4 values.yamls. Secrets.yam; also contains load-balancer IPs, I notice that the ingress-nginx got commented out at some point... perhaps that is the issue! daskhub:
jupyterhub:
proxy:
secretToken: SECRET
service:
loadBalancerIP: XXXXXXX.us-west-2.elb.amazonaws.com
auth:
custom:
config:
client_id: "SECRET"
client_secret: "SECRET"
hub:
services:
dask-gateway:
apiToken: "SECRET"
dask-gateway:
gateway:
proxyToken: "SECRET"
auth:
type: jupyterhub
jupyterhub:
apiToken: "SECRET"
# webProxy:
# service:
# loadBalancerIP: XXXXXXXXX.us-west-2.elb.amazonaws.com
# schedulerProxy:
# service:
# loadBalancerIP: XXXXXXXXXX.us-west-2.elb.amazonaws.com
# How do I get this IP without manually deploying?
# Just manually deploy, I guess
# ingress-nginx:
# controller:
# service:
# loadBalancerIP: "XXXXXXXXX.us-west-2.elb.amazonaws.com" (Is ingress-nginx only used for grafana @TomAugspurger ? #849) |
running proxy:
networkPolicy:
enabled: false Big long discussion of things here
|
so after redeploying with (full current config in #885) singleuser:
networkPolicy:
enabled: false
proxy:
chp:
networkPolicy:
enabled: false
https:
enabled: true
hub:
networkPolicy:
enabled: false
|
OK, I'm stuck again. No idea how to debug this further. I've turned off all network policies and still am not able to reach the landing page with https enabled. Turning off https (https: enabled: false) I can get to the login page, but auth0 gets stuck because it accepts https. So it seems there is some issue with https configuration, but none of the pod logs show any obvious errors. A quick glance around suggests there are some alternative HTTPs setups (compared to when we initially set things up over a year ago!). Maybe it's worth trying these unless others have some insight into what is no longer working: |
The staging GCP deployment seems to be fixed with this (deployed local) diff --git a/pangeo-deploy/values.yaml b/pangeo-deploy/values.yaml
index c191ced..b4a6618 100644
--- a/pangeo-deploy/values.yaml
+++ b/pangeo-deploy/values.yaml
@@ -16,7 +16,20 @@ daskhub:
jupyterhub:
# Helm config for jupyterhub goes here
# See https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/values.yaml
+ proxy:
+ https:
+ enabled: true
+ chp:
+ networkPolicy:
+ enabled: false
+ traefik:
+ networkPolicy:
+ enabled: false
singleuser:
+ networkPolicy:
+ # Disable network security policy, perhaps causing upgrade issues.
+ # https://github.com/pangeo-data/pangeo-cloud-federation/issues/884
+ enabled: false
cpu:
limit: 2
guarantee: 1 I might move those changes to just apply to GCP for now, till we get the rest sorted out. Unfortunately, I don't have much help to offer for the AWS side. I've never really understood how they do load balancers. |
@consideRatio - a clue at least for what is going wrong on AWS, "Error syncing load balancer: failed to ensure load balancer: LoadBalancerIP cannot be specified for AWS ELB"
@salvis2 suggested the fix of removing the following config from for the similar error message
|
Looking around a bit more https://docs.aws.amazon.com/eks/latest/userguide/load-balancing.html , I'm suspecting we're having issues b/c cluster subnets now require tags to cooperate with K8s and classic load balancer it seems "Public subnets must be tagged as follows so that Kubernetes knows to use only those subnets for external load balancers instead of choosing a public subnet in each Availability Zone (in lexicographical order by subnet ID). If you use eksctl or an Amazon EKS AWS CloudFormation template to create your VPC after March 26, 2020, then the subnets are tagged appropriately when they're created." We created this cluster a long while back and I don't see related issue eksctl-io/eksctl#1982 |
Many many thanks to @consideRatio for taking the time to do some live debugging with me on this. To fix this we ended up deleting the existing load balancer (over 600 days old) and redeploying. Here is a quick summary:
# 1.16.8
eksctl utils update-legacy-subnet-settings --cluster pangeo
eksctl utils update-coredns --cluster pangeo --approve
eksctl utils update-aws-node --cluster pangeo --approve
eksctl utils update-kube-proxy --cluster pangeo --approve
# path kube-proxy pods according to https://github.com/weaveworks/eksctl/issues/1088#issuecomment-717429367
kubectl edit daemonset kube-proxy --namespace kube-system
|
closes #884. fix AWS staging for jupyterhub 10.2
Currently unable to access staging hub on AWS with a
ERR_TIMED_OUT
andyour connection to this site is unsecure
. @consideRatio or @yuvipanda your guidance would be much appreciated here, I've spent a bit of time looking over issues but I'm not having any epiphanies.All pods are running, and there are not obvious error messages, but the hub log is very suspicious to me, these lines in particular:
Loading /etc/jupyterhub/config/values.yaml
Hub API listening on http://0.0.0.0:8081/hub/
Using Spawner: builtins.PatchedKubeSpawner
Initialized 1 spawners in 0.119 seconds
Full diff compared to our functioning prod hub:
https://gist.github.com/scottyhq/e381d2b01e3db0a162ae317faf9a2193/revisions
ping @tjcrone @TomAugspurger
The text was updated successfully, but these errors were encountered: