Error scanning namespace workloads if there are batch jobs running on it #18

saholo21 · 2023-08-31T23:18:39Z

I am trying to scan a cluster that has different kinds of workloads (deployments, pods, statefulsets, batch jobs, etc). However, when the scan finishes, I always get the same error: "jobs.batch not found. The following table may be incomplete due to errors detected during the run." The table only returns a single row, analyzing the kube-system namespace, but not all the other workloads, which amount to more than 300. I believe this issue arises because when the scan starts, there are some jobs running but then they finish during the scan (as they are meant to do). However, the plugin interprets this as an issue and throws an error. Is there any workaround for this problem?

Input = kubectl dds

Output =
error: [jobs.batch "job1" not found, jobs.batch "job2" not found, jobs.batch "job3" not found, jobs.batch "job4" not found]
Warning: The following table may be incomplete due to errors detected during the run
NAMESPACE TYPE NAME STATUS
kube-system daemonset aws-node mounted

The text was updated successfully, but these errors were encountered:

rothgar · 2023-09-01T06:11:05Z

Do you have a yaml example of the workload you're running?

saholo21 · 2023-09-04T18:52:38Z

No, I don't have access to jobs batch yaml. Is there any possibility of running the kubectl dds only for certain types of workloads? i.e., only for deployments, then do it only for statefulset and so on, to avoid the jobs batch scanning error

rothgar · 2023-09-05T22:39:00Z

That might be difficult to implement because the way it works is it scans all pods and then looks for the parent of the pod. It doesn't have a way to start with deployments and work their way down to the pods.

If I implemented this what types of flags would you want? --scan-resource=deployment or --skip=job It would get complicated to add both options but I would need something that could be the default behavior eg --scan-type=all but either way I still have to scan all pods in the cluster and inspect what owns them.

saholo21 · 2023-09-05T23:46:22Z

Understood, the type of flag that would fit the best for this case would be --skip=job, because that's the only workload with which I'm facing issues. However, do you know what could be happening? I mean, there are some running jobs but then they finish during the scan as they meant to do, but the plugin detects this as an error, Is that an expected behavior? Thanks for answering

rothgar · 2023-09-06T21:48:21Z

I'm not too sure what would be causing it without being able to replicate the problem or seeing the job spec with something like kubectl get job job1 --output yaml

What version of Kubernetes are you using?

saholo21 · 2023-09-07T22:23:46Z

I was able to get one of the job workloads that's throwing the error.
I am using Kubernetes 1.23 version.
Let me know if that helps.

apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2023-09-05T11:55:32Z"
  generation: 1
  labels:
    controller-uid: 80fef74c-a01f-4059-b345-d9238c974bec
    job-name: populate-analytic-data-aws-28231914
  name: populate-analytic-data-aws-28231914
  namespace: default
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: CronJob
    name: populate-analytic-data-aws
    uid: 4bb57997-3256-4197-b36d-3172c50732a8
  resourceVersion: "1177585793"
  uid: 80fef74c-a01f-4059-b345-d9238c974bec
spec:
  activeDeadlineSeconds: 10000
  backoffLimit: 3
  completionMode: NonIndexed
  completions: 1
  parallelism: 1
  selector:
    matchLabels:
      controller-uid: 80fef74c-a01f-4059-b345-d9238c974bec
  suspend: false
  template:
    metadata:
      creationTimestamp: null
      labels:
        controller-uid: 80fef74c-a01f-4059-b345-d9238c974bec
        job-name: populate-analytic-data-aws-28231914
    spec:
      containers:
      - args:
        - --botName
        - populate-analytic-data
        - --cassandra
        - cassandra-traffic-04.internal.company.com,cassandra-traffic-02.internal.company.com,cassandra-traffic-03.internal.company.com
        - --keyspace
        - traffic
        - --threads
        - "4"
        - --env
        - staging
        env:
        - name: ENV
          value: staging
        - name: log_level
          value: DEBUG
        image: 111111111111.dkr.ecr.us-east-1.amazonaws.com/populate-analytic-data:4.53-reporting
        imagePullPolicy: IfNotPresent
        name: docker
        resources:
          limits:
            cpu: 450m
            memory: 2000Mi
          requests:
            cpu: 250m
            memory: 1400Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  conditions:
  - lastProbeTime: "2023-09-05T14:42:12Z"
    lastTransitionTime: "2023-09-05T14:42:12Z"
    message: Job was active longer than specified deadline
    reason: DeadlineExceeded
    status: "True"
    type: Failed
  failed: 1
  startTime: "2023-09-05T11:55:32Z"

saholo21 · 2023-09-11T21:16:41Z

Hi @rothgar is there any update about this?

rothgar · 2023-09-12T16:08:57Z

Thank you for the example. I'm sorry I haven't been able to test this yet. I'm preparing for some work travel and conference talks and other priorities at work.

saholo21 · 2023-09-15T18:50:28Z

Hi @rothgar. Just a quick question to confirm something, if the error message only shows some jobs and the final warning says "The following table may be incomplete due to errors detected during the run" means that the result may be incomplete because only the jobs were not scanned and it is not known if they have a docker.sock mount or because this error with the jobs could have stopped the missing scans of other workloads (deployments, daemonsets, statefulset, etc)?

rothgar · 2023-09-15T20:28:47Z

It should continue with other jobs and workload types. It doesn't exit the app. It appends the error and continues.
https://github.com/aws-containers/kubectl-detector-for-docker-socket/blob/main/main.go#L270-L273

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error scanning namespace workloads if there are batch jobs running on it #18

Error scanning namespace workloads if there are batch jobs running on it #18

saholo21 commented Aug 31, 2023 •

edited

Loading

rothgar commented Sep 1, 2023

saholo21 commented Sep 4, 2023 •

edited

Loading

rothgar commented Sep 5, 2023

saholo21 commented Sep 5, 2023 •

edited

Loading

rothgar commented Sep 6, 2023

saholo21 commented Sep 7, 2023

saholo21 commented Sep 11, 2023

rothgar commented Sep 12, 2023

saholo21 commented Sep 15, 2023

rothgar commented Sep 15, 2023

Error scanning namespace workloads if there are batch jobs running on it #18

Error scanning namespace workloads if there are batch jobs running on it #18

Comments

saholo21 commented Aug 31, 2023 • edited Loading

rothgar commented Sep 1, 2023

saholo21 commented Sep 4, 2023 • edited Loading

rothgar commented Sep 5, 2023

saholo21 commented Sep 5, 2023 • edited Loading

rothgar commented Sep 6, 2023

saholo21 commented Sep 7, 2023

saholo21 commented Sep 11, 2023

rothgar commented Sep 12, 2023

saholo21 commented Sep 15, 2023

rothgar commented Sep 15, 2023

saholo21 commented Aug 31, 2023 •

edited

Loading

saholo21 commented Sep 4, 2023 •

edited

Loading

saholo21 commented Sep 5, 2023 •

edited

Loading