-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC: Set up Lagoon on EKS and get a demo of the CMS up and running #6674
Comments
Can we size this? |
Hey team! Please add your planning poker estimate with ZenHub @ElijahLynn @indytechcook @ndouglas @olivereri @timcosgrove |
Please add your planning poker estimate with ZenHub @cweagans |
Lagoon Core
I'm not sure what the ANSI art is trying to be. If I can get a source image, I'll try to create a clearer one. Upon requesting http://api.lagoon-dev.cms.va.gov/ (HTTPS doesn't work, because reasons), I get: Which is actually success (at this point)! That means that the routing and ingress are working properly. Next I'll fight with Keycloak, I guess.
I think that's fixable by overriding the keycloakAPIURL in the values file. - name: KEYCLOAK_API
{{- if .Values.keycloakAPIURL }}
value: {{ .Values.keycloakAPIURL | quote }}
{{- else }}
value: https://{{ index .Values.keycloak.ingress.hosts 0 "host" }}/auth
{{- end }} And it works: Well, sorta: Probably because this: So let's override this other URL: - name: GRAPHQL_API
{{- if .Values.lagoonAPIURL }}
value: {{ .Values.lagoonAPIURL | quote }}
{{- else }}
value: https://{{ index .Values.api.ingress.hosts 0 "host" }}/graphql
{{- end }}
|
SSHSSH access to the # This annotation is only required if you are creating an internal facing ELB. Remove this annotation to create public facing ELB.
service.beta.kubernetes.io/aws-load-balancer-internal: "true" After editing this into the service, the NLB seemed reachable via SSH from CMS-Test Dev:
The issue from here is that this isn't cleanly accessible from our local machines. A solution is probably straightforward for someone better versed in SOCKS and so forth. I'm currently messing with ProxyJump/ProxyCommand in SSH trying to get this working 🤔 This was the magic necessary to be able to connect (not login) from my local machine. Host lagoon
HostName internal-a5db579a60ddc4d94bd3bdd6cde40ef9-1394069038.us-gov-west-1.elb.amazonaws.com
User lagoon
ProxyCommand ssh -q -A dsva@vetsgov-dev-jumpbox-govwest-1b nc %h %p From here I can generate a token. However, it appears that Lagoon CLI doesn't use |
SSH and SOCKS5I thought Lagoon-CLI used the Go SSH client library, but upon closer inspection it seemed to use the SSH CLI. Then, upon still closer inspection, it only seemed to use the SSH CLI under certain circumstances. After discussing this with Elijah, Eric, and Cameron, we figured that a good course of action would be to modify the Lagoon CLI to support SOCKS5 or ProxyJump/ProxyCommand or something. Elijah opened an issue. This morning, I did some tentative work in that direction. Then I started getting itchy and changed the SSH generated command for the codepath that I was fairly sure was never executed, and -- it started working 😕 diff --git a/pkg/lagoon/ssh/main.go b/pkg/lagoon/ssh/main.go
index 3b6e013..23a1ee2 100644
--- a/pkg/lagoon/ssh/main.go
+++ b/pkg/lagoon/ssh/main.go
@@ -120,7 +120,7 @@ func RunSSHCommand(lagoon map[string]string, sshService string, sshContainer str
// GenerateSSHConnectionString .
func GenerateSSHConnectionString(lagoon map[string]string, service string, container string) string {
- connString := fmt.Sprintf("ssh -t -o \"UserKnownHostsFile=/dev/null\" -o \"StrictHostKeyChecking=no\" -p %v %s@%s", lagoon["port"], lagoon["username"], lagoon["hostname"])
+ connString := fmt.Sprintf("ssh -o \"ProxyCommand=ssh -q -A dsva@vetsgov-dev-jumpbox-govwest-1b nc %%h %%p\" -t -o \"UserKnownHostsFile=/dev/null\" -o \"StrictHostKeyChecking=no\" -p %v %s@%s", lagoon["port"], lagoon["username"], lagoon["hostname"])
if service != "" {
connString = fmt.Sprintf("%s service=%s", connString, service)
} Kinda: 🔔nathan.douglas@Belmore:~/Projects/lagoon-cli$ ./lagoon-cli login
Error: Post "http://api.lagoon-dev.cms.va.gov/graphql": dial tcp: lookup api.lagoon-dev.cms.va.gov: no such host So we need that SOCKS5 proxy to cover everything. But Go can import proxy information from an 🔔nathan.douglas@Belmore:~/Projects/lagoon-cli$ export HTTP_PROXY="socks5://127.0.0.1:2001/"
🔔nathan.douglas@Belmore:~/Projects/lagoon-cli$ ./lagoon-cli login
Token fetched and saved.
🔔nathan.douglas@Belmore:~/Projects/lagoon-cli$ ./lagoon-cli whoami
ID EMAIL FIRSTNAME LASTNAME SSHKEYS
2015e338-4c55-44f5-8217-25f77af81937 [email protected] Nathan Douglas 2 🎉 So at this point my work is unblocked and I can go find some new obstacle to slam into at high speed. But... why does it work? At this point in my engineering career, nothing makes me more suspicious than something that Just Works™. I did not bleed enough, I did not suffer enough for this to work. So I
After some poking around, I think that the answer was just to change my SSH connection info for Lagoon: current: lagoon-dev
default: lagoon-dev
lagoons:
amazeeio:
graphql: https://api.lagoon.amazeeio.cloud/graphql
hostname: ssh.lagoon.amazeeio.cloud
ui: https://dashboard.amazeeio.cloud
kibana: https://logs.amazeeio.cloud/
port: "32222"
token: ""
version: ""
lagoon-dev:
graphql: http://api.lagoon-dev.cms.va.gov/graphql
hostname: internal-a5db579a60ddc4d94bd3bdd6cde40ef9-1394069038.us-gov-west-1.elb.amazonaws.com
ui: https://ui.lagoon-dev.cms.va.gov
kibana: ""
port: "22"
token: <lemme 'lone>
version: v2.1.0
updatecheckdisable: false
environmentfromdirectory: false Then export the HTTP_PROXY. Then things seem to work and we can continue on our quest. I still don't really understand why this works. Fun With GraphQLThe next step is to play with Lagoon via GraphQL. Unfortunately: GraphiQL doesn't expose any sort of SOCKS proxy configuration. Fortunately, this is precisely the sort of suffering I've come to expect in engineering. With this command: curl -g \
--socks5-hostname 127.0.0.1:2001 \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <my-token>" \
-d '{"query":"query allProjects {allProjects {name } }"}' \
http://api.lagoon-dev.cms.va.gov/graphql I received the expected response: {"data":{"allProjects":[]}} With the following query: curl -g \
--socks5-hostname 127.0.0.1:2001 \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <lagoon-token>" \
-d '{"query": "mutation addKubernetes {\r\n addKubernetes(input:\r\n {\r\n name: \"lagoon-dev\",\r\n consoleUrl: \"https:\/\/4FE820642ABFA95BCB6854C69A1AF5A2.gr7.us-gov-west-1.eks.amazonaws.com\",\r\n token: \"<kubernetes-build-deploy-token>\",\r\n routerPattern: \"${environment}.${project}.lagoon-dev.cms.va.gov\"\r\n }){id}\r\n}"}' \
http://api.lagoon-dev.cms.va.gov/graphql I got the following: {"data":{"addKubernetes":{"id":1}}} Which might also be a sign that things are working. I'm not 100% on the legitimacy of that build-deploy token, though. My Next is creating the project: 🔔nathan.douglas@Belmore:~/Projects/lagoon-cli$ lagoon add project --gitUrl git://github.com/department-of-veterans-affairs/va.gov-cms.git --openshift 1 --productionEnvironment lagoon-dev --branches "^(master|main|VACMS-6674.*)$"
Result: success
Project Name: lagoon-dev
GitURL: https://github.com/department-of-veterans-affairs/va.gov-cms.git and it's visible upon login: I added this deploy key: and deployed: but alas: This might be failing because the logs pods are still in ImagePullBackoff: So it might be time for More Fun With Kubernetes™. EDIT: Nope, just should've supplied a That made it further: but without logs, my ability to figure out wut's going on is obv limited, so I probably need to fix the root issue there. |
Fun With Lagoon, Kubernetes, Docker, RDS, IDK WhatSo why are the logs (and only the logs) in ImagePullBackoff? The first obstacle along the way is that But now that I can 🔔nathan.douglas@Belmore:~/Projects/content-build$ kubectl get pods --all-namespaces | grep lagoon-build
lagoon-dev-master lagoon-build-wl8fej 0/1 Error 0 24m
lagoon lagoon-remote-lagoon-build-deploy-bfb74bf4-mrf66 2/2 Running 0 28h 🔔nathan.douglas@Belmore:~/Projects/content-build$ kubectl describe pod -n lagoon-dev-master lagoon-build-wl8fej
Name: lagoon-build-wl8fej
Namespace: lagoon-dev-master
<snip>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 24m default-scheduler Successfully assigned lagoon-dev-master/lagoon-build-wl8fej to ip-10-247-96-165.us-gov-west-1.compute.internal
Normal Pulling 24m kubelet, ip-10-247-96-165.us-gov-west-1.compute.internal Pulling image "uselagoon/kubectl-build-deploy-dind:latest"
Normal Pulled 24m kubelet, ip-10-247-96-165.us-gov-west-1.compute.internal Successfully pulled image "uselagoon/kubectl-build-deploy-dind:latest" in 1.14009574s
Normal Created 24m kubelet, ip-10-247-96-165.us-gov-west-1.compute.internal Created container lagoon-build
Normal Started 24m kubelet, ip-10-247-96-165.us-gov-west-1.compute.internal Started container lagoon-build 🔔nathan.douglas@Belmore:~/Projects/content-build$ kubectl logs -n lagoon-dev-master lagoon-build-wl8fej
Agent pid 33
Identity added: /home/.ssh/key (/home/.ssh/key)
+ set -eo pipefail
+ set -o noglob
+ REGISTRY=none.com
++ cat /var/run/secrets/kubernetes.io/serviceaccount/namespace
+ NAMESPACE=lagoon-dev-master
+ REGISTRY_REPOSITORY=lagoon-dev-master
++ cat /lagoon/version
+ LAGOON_VERSION=21.9.0
+ set +x
+ '[' false == true ']'
+ CI_OVERRIDE_IMAGE_REPO=
+ '[' branch == pullrequest ']'
+ /kubectl-build-deploy/scripts/git-checkout-pull.sh git://github.com/department-of-veterans-affairs/va.gov-cms.git origin/master
+ set -eo pipefail
+ REMOTE=git://github.com/department-of-veterans-affairs/va.gov-cms.git
+ REF=origin/master
+ git init .
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: git branch -m <name>
Initialized empty Git repository in /kubectl-build-deploy/git/.git/
+ git config remote.origin.url git://github.com/department-of-veterans-affairs/va.gov-cms.git
+ git fetch --depth=10 --tags --progress git://github.com/department-of-veterans-affairs/va.gov-cms.git '+refs/heads/*:refs/remotes/origin/*'
fatal: unable to connect to github.com:
github.com[0: 192.30.255.112]: errno=Operation timed out Hmm. So it looks kinda like there's an outgoing networking issue. When consulted, Eric and Elijah nodded sadly and explained that outgoing requests to port SSH are dropped by the TIC. And although GitHub can be SSH'ed to on port 443 this violates the spirit of TIC law and would get me yelled at. Two solutions are:
A decision on the latter probably isn't possible until Monday, so I'm kinda blocked here. I think I'll go back and see if I can get HTTPS cloning to work. IDK why it wouldn't, but it didn't before. EDIT: Yeah, no, definitely still doesn't work. |
I'm blocked on moving much forward by the outgoing Git/SSH issue, but I can move forward with other things... EFSI created an EFS filesystem
This created a storage class with the name That does nothing to unblock me with regard to Git/SSH, though, and I still need to figure out a couple things:
|
The ops team, in office hours, confirmed our suspicions that this restriction on outbound SSH is pretty legit. As such, this PoC is blocked. We have a number of options for moving forward (h/t Cameron for typing them up):
|
Roundabout ApproachesA few of the options above could actually be addressed. I've addressed them, sorta, and will discuss. Modifying the Lagoon Build Deploy ImageNo one has responded yet to my discussion thread about git cloning via HTTPS. However, even if they had, it wouldn't work because the DHS is MITMing the TLS. I forked the Lagoon service images, rebuilt the The second half of that was to actually alter the Lagoon configuration to use the new Docker image. I injected the override into the remote-values.yaml and updated the Modifying the codebaseThere are some changes that need to be made to the CMS codebase as part of a move to Lagoon. I made them in #6867, although I have no way of testing them. |
So got some responses on this discussion thread saying that although the build-deploy pipeline was implemented with Git/SSH in mind, that was mostly to accommodate GitHub deploy keys and that there was no real hard reason that HTTPS cloning should not work. Who's to blame? I mentioned that I'd injected the TIC TLS cert into a Docker container, at which point Toby pointed out that I was using an older Dockerfile -- a great catch which undoubtedly would save me some frustration. So going back to Docker to build the new image:
I ran into this issue which seems to plague Docker for Mac. I don't want to upgrade Docker for Mac because that's caused issues with Lando in the past. Fortunately, I have about sixty LXC containers with Docker installed, so I'll just SSH into one of them and build the image and push it from there. Well, then SSH is hanging. I can't SSH into any of said containers, or anything else on my network. SSH works with everything else on my network... except my work computer. I attempted to find a solution for a few minutes, but being pressed for time I ended up just switching computers, SSHing into my work computer from my personal computer, grabbing the updated Dockerfile, then SSHing into an LXC container to build the The CMS project's URL is HTTPS, so I can attempt to deploy the branch PR to see where my PR (see #6867 ) fails: 🔔nathan.douglas@Belmore:~/Projects/lagoon-stuff$ lagoon deploy branch -p lagoon-dev -b VACMS-6674-lagoon
✔ Yes
success Now I can log into Lagoon UI because Keycloak is running because it's no longer in And: 🎉 It's taking longer to fail than it has before. Which is, technically, progress.
So something is requesting Harbor, but doing so via HTTPS and not HTTP. Since I've not specified HTTPS anywhere, this would appear to be an issue with a script somewhere. After doing so, I don't see any commands issued after Why? Well, Docker requires some additional configuration for insecure registries -- configuration that I don't believe the The problem is that I think since this is built around Docker-in-Docker that we're using insecure registries as specified by the host, not by the container. So I think this might be doomed to fail. <snip: I tried it anyway. It failed.> So the only way to move forward at this point, AFAICT, is to add the insecure registry for Harbor to the So I'm blocked again. |
LOL, I remember none of this. Summarizing significant issues that I encountered in this PoC:
|
Description
As a CMS engineer, I would like to validate that Lagoon will be sufficient for our needs so that we can begin to evaluate the value and cost-savings that Lagoon potentially offers.
Acceptance Criteria
CMS Team
Please leave only the team that will do this work selected. If you're not sure, it's fine to leave both selected.
Platform CMS Team
Sitewide CMS Team
Related #6673
The text was updated successfully, but these errors were encountered: