Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B2k restore job not able to complete after terminal window close when one has more than one terminal window open. #651

Open
williamohara opened this issue Dec 19, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@williamohara
Copy link

williamohara commented Dec 19, 2024

Describe the bug
B2k restore job not able to complete after terminal window close when one has more than one terminal window open.

Mention the platform you are using
WSL Linux on Windows 11 using VSCode connected to WSL using the WSL Code server
specifically

Windows About

Edition	Windows 11 Pro
Version	24H2
Installed on	‎11/‎25/‎2024
OS build	26100.2605
Experience	Windows Feature Experience Pack 1000.26100.36.0

VS Code About

Version: 1.96.1 (user setup)
Commit: 42b266171e51a016313f47d0c48aca9295b9cbb2
Date: 2024-12-17T17:50:05.206Z
Electron: 32.2.6
ElectronBuildId: 10629634
Chromium: 128.0.6613.186
Node.js: 20.18.1
V8: 12.8.374.38-electron.0
OS: Windows_NT x64 10.0.26100

Linux Distro running in WSL

~ $ hostnamectl
 Static hostname: Code-Dragon
       Icon name: computer-container
         Chassis: container
      Machine ID: 39d26ee053a74ed1b123ec2ad80c7eb2
         Boot ID: 83cc156fb19447e1ac923430d8e24420
  Virtualization: wsl
Operating System: Ubuntu 22.04.4 LTS
          Kernel: Linux 5.15.167.4-microsoft-standard-WSL2
    Architecture: x86-64
~ $ cat /etc/*-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.4 LTS"
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
~ $

b2k version

Installation
Identifier
mindaro.mindaro
Version
2.0.120240111
Last Updated
2024-04-27, 13:17:44
Size
166.8 MB
Cache
205.5 MB

To Reproduce
Steps to reproduce the behavior:
Create an AKS Cluster and implement a deployment running any application in a pod - apply a service as well, below is the Yaml for my setup (note: i am suffering this problem on every service i use B2k on.)


apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: org-service-ss
  name: org-service-ss
  namespace: subscripify-super-principal
spec:
  replicas: 1
  selector:
    matchLabels:
      app: org-service-ss
  template:
    metadata:
      labels:
        app: org-service-ss
        azure.workload.identity/use: "true"
    spec:
      serviceAccountName: subscripify-super-principal-service-account
      containers:
      - name: org-service-ss
        env:
        - name: SUBSCRIPIFY_SERVICE_NAME
          value: org-service-ss
        - name: SUBSCRIPIFY_SERVICE_LEVEL
          value: lord
        - name: SUBSCRIPIFY_DB_ENV
          value: proddb
        - name: SUBSCR_LOG_LEVEL
          value: trace
        - name: SUBSCR_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: SUBSCR_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: SUBSCR_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: SUBSCR_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: SUBSCR_POD_SERVICE_ACCOUNT
          valueFrom:
            fieldRef:
              fieldPath: spec.serviceAccountName
        - name: SUBSCR_TENANT
          valueFrom:
            configMapKeyRef:
              name: subscripify-tenant-config
              key: tenantUUID
        - name: SUBSCR_CLOUD_CONTEXT
          valueFrom:
            configMapKeyRef:
              name: subscripify-tenant-config
              key: tenantCloudContext
        image: subscripifycontreg.azurecr.io/org-service-ss:latest
        
---
apiVersion: v1
kind: Service
metadata:
  name: org-service-ss
  namespace: subscripify-super-principal
  
spec:
  selector:
    app: org-service-ss
  ports:
  - protocol: TCP
    port: 50051
    targetPort: 50051
    name: grcp

to set my environment variables when using B2k i am using a KubernetesLocalProcessConfig.yaml file ( Note: I have this problem even if i do not use a KubernetesLocalProcessConfig.yaml file - i am just including it for good measure.

version: 0.1
env:
  - name: SUBSCR_TENANT
    value: "e6f9adf0-37e9-4118-b749-b33651d5e203"
  - name: SUBSCR_POD_NAME
    value: "org-service-ss-c9899f68c-vj8jb"
  - name: SUBSCR_POD_SERVICE_ACCOUNT
    value: "subscripify-super-principal-service-account"
  - name: SUBSCR_CLOUD_CONTEXT
    value: "azure"
  - name: AZURE_AUTHORITY_HOST
    value: "https://login.microsoftonline.com/"
  - name: SUBSCR_LOG_LEVEL
    value: "trace"
  - name: SUBSCR_NODE_NAME
    value: "aks-subagntpool-37118338-vmss00002j"
  - name: SUBSCR_POD_NAMESPACE
    value: "subscripify-super-principal"
  - name: AZURE_CLIENT_ID
    value: "e0de541c-b584-46ec-9064-bb15c6bf1a35"
  - name: SUBSCR_POD_IP
    value: "172.16.6.37"
  - name: AZURE_TENANT_ID
    value: "e2752eb2-38c3-4793-bf6c-b73751ee06ee"
  - name: AZURE_FEDERATED_TOKEN_FILE
    value: "/var/run/secrets/azure/tokens/azure-identity-token"
  - name: SUBSCR_BRIDGE
    value: "true"

This is my debugging pre-launch task:

{
	"version": "2.0.0",
	"tasks": [
            {
              "label": "bridge-to-kubernetes.api-mgmt-service",
              "type": "bridge-to-kubernetes.resource",
              "resource": "org-service-ss",
              "resourceType": "service",
              "ports": [
                50051
              ],
              "targetCluster": "core-cluster",
              "targetNamespace": "subscripify-super-principal",
              "useKubernetesServiceEnvironmentVariables": true,
              "targetContainer": "org-service-ss"
            }
        ]
}

and this is my debugging launch script

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Launch the management service",
      "type": "go",
      "request": "launch",
      "mode": "debug",
      "program": "${workspaceFolder}/cmd/org-service-ss/main.go",
      "preLaunchTask": "bridge-to-kubernetes.api-mgmt-service"
    }
  ]
}

Launch a debugging session using similar setup. you will notice that B2k does what it should start up the session:
The debugger window is running in the VSCode Debugger window ( i see my logs)
VS Code has opened a second integrated terminal to run whatever it needs locally (I presume) - here is a copy of what is in that integrated terminal


Redirecting Kubernetes service org-service-ss to your machine...
Target cluster: core-cluster
Current cluster: core-cluster
Target namespace: subscripify-super-principal
Current namespace: subscripify-super-principal
Target service name: org-service-ss
Target service ports: 50051
Using kubernetes service environment variables: true

Retrieving the current context and credentials...
Validating the credentials to access the cluster...
Validating the requirements to replicate resources locally...
Redirecting traffic from the cluster to your machine...
Loaded Bridge To Kubernetes environment file 'KubernetesLocalProcessConfig.yaml'.
Waiting for 'org-service-ss-7ff84fdbc5-qkfv2' in namespace 'subscripify-super-principal' to reach running state...
Deployment 'subscripify-super-principal/org-service-ss' patched to run agent.
Remote agent deployed in container 'org-service-ss' in pod 'org-service-ss-7ff84fdbc5-qkfv2'.
Preparing to run Bridge To Kubernetes configured as pod subscripify-super-principal/org-service-ss-7ff84fdbc5-qkfv2 ...
Connection established.
Service 'org-service-ss' is available on 127.0.0.1:55049.
Container port 50051 is available at localhost:50051.
##################### Environment started. #############################################################
Run /tmp/tmp-378708f88mhN9rWVm.env.cmd in your existing console to also get connected.
 *  Terminal will be reused by tasks, press any key to close it. 

The deployment on my AKS cluster has changed - it is running the image for bridge and it has added some vars. here is a copy of the modified deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app":"org-service-ss"},"name":"org-service-ss","namespace":"subscripify-super-principal"},"spec":{"replicas":1,"selector":{"matchLabels":{"app":"org-service-ss"}},"template":{"metadata":{"labels":{"app":"org-service-ss","azure.workload.identity/use":"true"}},"spec":{"containers":[{"env":[{"name":"SUBSCRIPIFY_SERVICE_NAME","value":"org-service-ss"},{"name":"SUBSCRIPIFY_SERVICE_LEVEL","value":"lord"},{"name":"SUBSCRIPIFY_DB_ENV","value":"proddb"},{"name":"SUBSCR_LOG_LEVEL","value":"trace"},{"name":"SUBSCR_NODE_NAME","valueFrom":{"fieldRef":{"fieldPath":"spec.nodeName"}}},{"name":"SUBSCR_POD_NAME","valueFrom":{"fieldRef":{"fieldPath":"metadata.name"}}},{"name":"SUBSCR_POD_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},{"name":"SUBSCR_POD_IP","valueFrom":{"fieldRef":{"fieldPath":"status.podIP"}}},{"name":"SUBSCR_POD_SERVICE_ACCOUNT","valueFrom":{"fieldRef":{"fieldPath":"spec.serviceAccountName"}}},{"name":"SUBSCR_TENANT","valueFrom":{"configMapKeyRef":{"key":"tenantUUID","name":"subscripify-tenant-config"}}},{"name":"SUBSCR_CLOUD_CONTEXT","valueFrom":{"configMapKeyRef":{"key":"tenantCloudContext","name":"subscripify-tenant-config"}}}],"image":"subscripifycontreg.azurecr.io/org-service-ss:latest","name":"org-service-ss"}],"serviceAccountName":"subscripify-super-principal-service-account"}}}}
  creationTimestamp: 2024-12-19T00:29:52Z
  generation: 2
  labels:
    app: org-service-ss
  name: org-service-ss
  namespace: subscripify-super-principal
  resourceVersion: "4928947"
  uid: 58dadcf3-e4a4-46bf-bbcb-f2e28b5d16ed
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: org-service-ss
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: org-service-ss
        azure.workload.identity/use: "true"
    spec:
      containers:
        - env:
            - name: SUBSCRIPIFY_SERVICE_NAME
              value: org-service-ss
            - name: SUBSCRIPIFY_SERVICE_LEVEL
              value: lord
            - name: SUBSCRIPIFY_DB_ENV
              value: proddb
            - name: SUBSCR_LOG_LEVEL
              value: trace
            - name: SUBSCR_NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
            - name: SUBSCR_POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: SUBSCR_POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: SUBSCR_POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
            - name: SUBSCR_POD_SERVICE_ACCOUNT
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.serviceAccountName
            - name: SUBSCR_TENANT
              valueFrom:
                configMapKeyRef:
                  key: tenantUUID
                  name: subscripify-tenant-config
            - name: SUBSCR_CLOUD_CONTEXT
              valueFrom:
                configMapKeyRef:
                  key: tenantCloudContext
                  name: subscripify-tenant-config
            - name: BRIDGE_COLLECT_TELEMETRY
              value: "True"
            - name: CONSOLE_VERBOSITY
              value: Verbose
            - name: BRIDGE_CORRELATION_ID
              value: 8d2291d4-8713-4a43-9d7e-51f3285987351734566244115:b318e6c8ce02:be345b679a39
          image: bridgetokubernetes.azurecr.io/lpkremoteagent:1.3.4
          imagePullPolicy: Always
          name: org-service-ss
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: subscripify-super-principal-service-account
      serviceAccountName: subscripify-super-principal-service-account
      terminationGracePeriodSeconds: 30

looking at the logs for the pod that is now running on the cluster for my service - they look like this...

2024-12-19T00:31:11.6231246Z | RemoteAgent | TRACE | ReversePortForwardConnector created for port 50051\nOperation context: <json>{"clientRequestId":"d55f1790-028e-4ddf-b923-4c8bf262a712","correlationId":"8d2291d4-8713-4a43-9d7e-51f3285987351734566244115:b318e6c8ce02:be345b679a39","requestId":null,"userSubscriptionId":null,"startTime":"2024-12-19T00:31:08.6361064+00:00","userAgent":"RemoteAgent/1.0.0.0","requestHttpMethod":null,"requestUri":null,"version":"1.0.0.0","requestHeaders":{},"loggingProperties":{"ApplicationName":"RemoteAgent","DeviceOperatingSystem":"Linux 5.15.0-1075-azure #84-Ubuntu SMP Mon Oct 21 15:42:52 UTC 2024","Framework":".NET 7.0.19","ProcessId":1,"TargetEnvironment":"Production"}}</json>
2024-12-19T00:31:11.6406317Z | RemoteAgent | TRACE | ReversePortForwardConnector start listening on port 50051

all is as expected - i can debug. the problem comes when i need to terminate the session.

I first hit the stop button in vs code...
Image

I then kill the terminal that B2k had started by pressing any key while that terminal has focus
Image

doing so starts a restore job on my cluster in the same namespace as the replaced service... here is the yaml for that...

apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: 2024-12-19T00:31:08Z
  generation: 1
  labels:
    mindaro.io/component: lpkrestorationjob
    mindaro.io/instance: 727f5cb37f
    mindaro.io/version: 1.3.4
  name: org-service-ss-restore-727f5
  namespace: subscripify-super-principal
  resourceVersion: "4928971"
  uid: afcf6e62-e048-40db-b0a5-8b87d206fe06
spec:
  backoffLimit: 10
  completionMode: NonIndexed
  completions: 1
  manualSelector: false
  parallelism: 1
  podReplacementPolicy: TerminatingOrFailed
  selector:
    matchLabels:
      batch.kubernetes.io/controller-uid: afcf6e62-e048-40db-b0a5-8b87d206fe06
  suspend: false
  template:
    metadata:
      creationTimestamp: null
      labels:
        batch.kubernetes.io/controller-uid: afcf6e62-e048-40db-b0a5-8b87d206fe06
        batch.kubernetes.io/job-name: org-service-ss-restore-727f5
        controller-uid: afcf6e62-e048-40db-b0a5-8b87d206fe06
        job-name: org-service-ss-restore-727f5
        mindaro.io/component: lpkrestorationjob
        mindaro.io/instance: 727f5cb37f
        mindaro.io/version: 1.3.4
    spec:
      containers:
        - env:
            - name: NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: INSTANCE_LABEL_VALUE
              value: 727f5cb37f
            - name: BRIDGE_ENVIRONMENT
              value: Production
            - name: BRIDGE_COLLECT_TELEMETRY
              value: "True"
            - name: BRIDGE_CORRELATION_ID
              value: 8d2291d4-8713-4a43-9d7e-51f3285987351734566244115:b318e6c8ce02:be345b679a39
          image: bridgetokubernetes.azurecr.io/lpkrestorationjob:1.3.4
          imagePullPolicy: Always
          name: lpkrestorationjob
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /etc/patchstate
              name: patchstate
              readOnly: true
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: OnFailure
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: lpkrestorationjob-v2
      serviceAccountName: lpkrestorationjob-v2
      terminationGracePeriodSeconds: 30
      volumes:
        - name: patchstate
          secret:
            defaultMode: 420
            secretName: org-service-ss-restore-727f5
status:
  active: 1
  ready: 1
  startTime: 2024-12-19T00:31:08Z
  terminating: 0
  uncountedTerminatedPods: {}

Bug! - the job NEVER stops!

looking at the logs in the job it keeps repeating the following...

2024-12-19T00:58:26.4306979Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"GetV1DeploymentAsync","success":true,"duration":null,"properties":{}}</json>
2024-12-19T00:58:26.4372749Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"ListPodsInNamespaceAsync","success":true,"duration":null,"properties":{}}</json>
2024-12-19T00:58:26.4373520Z | RestorationJob | TRACE | Dependency: Kubernetes <json>{"name":"Kubernetes","target":"ListPodsForDeploymentAsync","success":true,"duration":null,"properties":{}}</json>
2024-12-19T00:58:26.4386827Z | RestorationJob | TRACE | Event: RestorationJob-AgentPing <json>{"eventName":"RestorationJob-AgentPing","properties":{"RestorePerformed":"false","NumFailedPings":"0","HasConnectedClients":"true","Result":"Succeeded"},"metrics":{"DurationInMs":15}}</json>

It will keep doing that until I close ALL open terminal windows on my machine (i will have several open at any given time I will have a few repos open in vs code. I have attached a video demonstrating. sometimes I have four or five VS code instances open at any time across several desktops - but this video demonstrates the problem with two VS codes open on two different projects on the same desktop.
https://github.com/user-attachments/assets/ddbb0e51-4e4e-4556-9f95-3dd61a677189

Expected behavior
I expect the restoration job to complete immediately when I close the B2k terminal window in VScode- i should not have to close VSCode entirely and I most certainly should not have to close all VSCode windows.

I would also expect all resources that B2K deploys on the cluster - even after I figured out how to get the job to stop there is stil a role and role binding left on the cluster (here are their yamls)

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: 2024-12-19T00:31:07Z
  labels:
    mindaro.io/component: lpkrestorationjob
    mindaro.io/version: v2
  name: lpkrestorationjob-role-v2
  namespace: subscripify-super-principal
  resourceVersion: "4928916"
  uid: 21ba1725-52a4-449c-b048-68fca0d48853
rules:
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - get
      - list
      - update
      - patch
      - delete
  - apiGroups:
      - extensions
      - apps
    resources:
      - deployments
      - statefulsets
    verbs:
      - get
      - list
      - update
      - patch
  - apiGroups:
      - extensions
      - apps
    resources:
      - replicasets
    verbs:
      - get
      - list
  - apiGroups:
      - ""
    resources:
      - secrets
    verbs:
      - delete
      - list
  - apiGroups:
      - batch
    resources:
      - jobs
    verbs:
      - delete
      - list

and

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  creationTimestamp: 2024-12-19T00:31:07Z
  labels:
    mindaro.io/component: lpkrestorationjob
    mindaro.io/version: v2
  name: lpkrestorationjob-binding-v2
  namespace: subscripify-super-principal
  resourceVersion: "4928917"
  uid: 904e7211-de51-4206-a95f-9daea9ffd7ce
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: lpkrestorationjob-role-v2
subjects:
  - kind: ServiceAccount
    name: lpkrestorationjob-v2
    namespace: subscripify-super-principal
@williamohara williamohara added the bug Something isn't working label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant