Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluent-bit logs skip issue : scan_blog add(): dismissed #4014

Open
mitesh-devops opened this issue Aug 27, 2021 · 43 comments
Open

Fluent-bit logs skip issue : scan_blog add(): dismissed #4014

mitesh-devops opened this issue Aug 27, 2021 · 43 comments
Assignees

Comments

@mitesh-devops
Copy link

I would request some help for following fluent-bit logs which is causing logs skip for our many apps.It works correctly for few hours of the restart of fluent-bit.

logs 1:

[2021/08/27 08:01:29] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/container-name-1627776000-g52f7_prod-apps_container-name-4e462ab4ec40adb0c42bece9c054e16466ba444afd1f0c2634be03e3b86613d1.log, inode 533287
[2021/08/27 08:01:29] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/container-name-1630049400-w2hdn_prod-apps_container-name-8cea19f5e062b0d9b267f5ac3db0f8d74b0e690c62c5448a8b6dd16c2df1195f.log, inode 1605932

frequency: around 15-20 entries every 5 minutes when container starts and after few hours, it's the only logs which fluentbit prints and all logs get stopped.

logs 2:

[2021/08/27 08:01:42] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=1606300 watch_fd=26
[2021/08/27 08:01:42] [debug] [input:tail:tail.0] inode=1605129 events: IN_ATTRIB 
[2021/08/27 08:01:42] [debug] [input:tail:tail.0] inode=1605129 file has been deleted: /var/log/containers/container-name-1630049400-g7mr8_prod-apps_container-name-7fcddb34d4164fbca29a342c75430d5f30ba0fb14e1ccdd183dbac82ef70e784.log
[2021/08/27 08:01:42] [debug] [input:tail:tail.0] db: file deleted from database: /var/log/containers/container-name-1630049400-g7mr8_prod-apps_container-name-7fcddb34d4164fbca29a342c75430d5f30ba0fb14e1ccdd183dbac82ef70e784.log

frequency: very few times

log 3:

[2021/08/27 12:30:34] [debug] [socket] could not validate socket status for #146 (don't worry)
[2021/08/27 12:30:57] [debug] [socket] could not validate socket status for #49 (don't worry)

frequency: Sometimes only.

Rest of the logs looks normal and correctly gathering and writing to elasticsearch.

Cluster Information:

  1. kubernetes nodes with around 20 nodes with fluentd deamonset.
  2. Elasticsearch 5 master-slave nodes:
  3. master nodes:
  4. 28GB RAM(per node)
  5. CPU 3 core(per node)
  6. elasticsearch:7.9.2
  7. fluent-bit version: 1.7

Expected behaviour: It shouldn't skip logs and should work without restart after sometime.

fluent-bit.conf

apiVersion: v1
data:
  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           Off
        Buffer_Size         0
        #Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
  fluent-bit.conf: |
    [SERVICE]
        Flush         3
        #Grace         120
        Log_Level     debug
        Daemon        off
        Parsers_File  parsers.conf
        #storage.path  /var/log/flb-storage/
        #storage.sync  normal
        #storage.checksum off
        #storage.backlog.mem_limit 20M
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-elasticsearch.conf
  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Exclude_Path      /var/log/containers/*_kube-system_*.log,/var/log/containers/*_log_*.log,/var/log/containers/*_mog_*.log,/var/log/containers/*_pd_*.log,/var/log/containers/*_kubecost_*.log,/var/log/containers/*_elastic-system_*.log,/var/log/containers/*_istio-*.log,/var/log/containers/*_cs_*.log,/var/log/containers/*_kube-public_*.log,/var/log/containers/*_ss_*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     512MB
        Skip_Long_Lines   On
        Refresh_Interval  10
        # Control the log line length
        Buffer_Chunk_Size 10MB
        Buffer_Max_Size   40MB
        # Using the docker mode to deal with multiline messages emitted by docker
        Docker_Mode       On
        #storage.type      filesystem
        Rotate_Wait      30
  output-elasticsearch.conf: |
    [OUTPUT]
        Name            es
        Match           *
        Host            ${FLUENT_ELASTICSEARCH_HOST}
        Port            ${FLUENT_ELASTICSEARCH_PORT}
        HTTP_User       ${FLUENT_ELASTICSEARCH_USER}
        HTTP_Passwd     ${FLUENT_ELASTICSEARCH_PASSWORD}
        Logstash_Format On
        Logstash_Prefix newprod
        Replace_Dots    On
        Retry_Limit     False
        tls             On
        tls.verify      Off
  parsers.conf: |
    [PARSER]
        Name   apache
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache2
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache_error
        Format regex
        Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$

    [PARSER]
        Name   nginx
        Format regex
        Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        # http://rubular.com/r/tjUt3Awgg4
        Name cri
        Format regex
        Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z

    [PARSER]
        Name        syslog
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S

fluent-bit.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
  labels:
    k8s-app: fluent-bit-logging
    kubernetes.io/cluster-service: "true"
    version: v1
  name: fluent-bit
  namespace: logging
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: fluent-bit-logging
  template:
    metadata:
      annotations:
        prometheus.io/path: /api/v1/metrics/prometheus
        prometheus.io/port: "2020"
        prometheus.io/scrape: "true"
      labels:
        k8s-app: fluent-bit-logging
        kubernetes.io/cluster-service: "true"
        version: v1
    spec:
      containers:
      - env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: ******
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        - name: FLUENT_ELASTICSEARCH_USER
          value: *******
        - name: FLUENT_ELASTICSEARCH_PASSWORD
          value: ********
        - name: FLUENT_ELASTICSEARCH_SCHEME
          value: https
        image: fluent/fluent-bit:1.7
        imagePullPolicy: Always
        name: fluent-bit
        ports:
        - containerPort: 2020
          protocol: TCP
        resources:
          requests:
            cpu: 250m
            memory: 750Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log
          name: varlog
        - mountPath: /var/lib/docker/containers
          name: varlibdockercontainers
          readOnly: true
        - mountPath: /fluent-bit/etc/
          name: fluent-bit-config
      nodeSelector:
        logging: "true"
      restartPolicy: Always
      securityContext: {}
      serviceAccount: fluent-bit
      serviceAccountName: fluent-bit
      terminationGracePeriodSeconds: 10
      volumes:
      - hostPath:
          path: /var/log
          type: ""
        name: varlog
      - hostPath:
          path: /var/lib/docker/containers
          type: ""
        name: varlibdockercontainers
      - configMap:
          defaultMode: 420
          name: fluent-bit-config
        name: fluent-bit-config

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Sep 27, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2021

This issue was closed because it has been stalled for 5 days with no activity.

@rguske
Copy link

rguske commented Nov 9, 2021

Hi @mitesh-devops, were you able to solve this issue? I'm facing the same, see #4155

@kc-dot-io
Copy link

Same here.

@soanni
Copy link

soanni commented Dec 8, 2021

same here

@mitesh-devops
Copy link
Author

Hi, Actually, it was more issue of the output plugin. In my case, output pluging was Elasticsearch so sizing the shards correctly and including rollover ILM policies helped me to get this fixed. Also, try adding more RAM to diagnose.

@joebatish
Copy link

any solution regrading this ?

@joebatish
Copy link

@kc-dot-io Did you find solution for this ?

@tr2vil
Copy link

tr2vil commented Mar 8, 2022

I had same issue. But I think, may be I found which is main factor.
I think Refresh_Interval, Buffer_Max_Size and Buffer_Chunk_Size is key-factor. I fixed this configuration after then there are not exist same error.

@ankitkr
Copy link

ankitkr commented Mar 10, 2022

Has this been resolved?
This seems to be related to not removal of inode during file removal process i.e. (the inode entry not being removed from files_static, files_event[https://github.com/fluent/fluent-bit/blob/master/plugins/in_tail/tail_config.h#L129, https://github.com/fluent/fluent-bit/blob/master/plugins/in_tail/tail_file.c#1108])

@rguske
Copy link

rguske commented Mar 10, 2022

Hi all,
it finally worked for our project after replacing Docker with Containerd and by applying the following config:
https://github.com/vmware-samples/vcenter-event-broker-appliance/tree/development/files/configs/fluentbit/templates

#4155 (comment)

@edsiper
Copy link
Member

edsiper commented Mar 17, 2022

There is a bug in v1.8.13, we have just published the new container image for v1.8.14, please upgrade:

fluent/fluent-bit:1.8.14

@olastor
Copy link

olastor commented Apr 1, 2022

Same here (v1.9.1)

@Lord-Y
Copy link

Lord-Y commented Apr 22, 2022

Still having the issue with v1.9.2 too. any fix soon?

@maiconbaum
Copy link

Same in 1.9.3

@ahvahsky2008
Copy link

Same issue guys

@kaleyroy
Copy link

kaleyroy commented Jun 1, 2022

Same in v1.9.3

@kaleyroy
Copy link

kaleyroy commented Jun 1, 2022

image

@RayMeibergen
Copy link

Same in 1.9.4

@CallumDowling
Copy link

Any resolution for this, processes a few logs then falls over ?
Docker image: fluent/fluent-bit:1.9.3-debug

@Jonathan-w6d
Copy link

Has anyone tried turning Skip_Long_Lines to off ?
Some of my logs were dismissed, because they were too long I guess, this did the trick

@kc-dot-io
Copy link

@Jonathan-w6d yes. This will help for a while, depending on how much data you are processing. The root issue is a memory leak. Increasing buffer / skipping long lines helps get it working but the issue will still reoccur every so often. I have to restart our fluentbit pods every 90-100 days or so.

@ashutosh-maheshwari
Copy link

The same issue is in v2.0.10.

[2023/04/04 21:06:11] [debug] [input:tail:tail.3] scan_blog add(): dismissed: /logs/server.out, inode 277548
[2023/04/04 21:06:11] [debug] [input:tail:tail.4] scan_blog add(): dismissed: /logs/server.FATAL, inode 262809

@curvesoft
Copy link

The same issue is in v2.1.1.
[2023/05/22 23:34:52] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /tomcat-logs/platform-2023-05-22.debug.log, inode 57744220
[2023/05/22 23:34:52] [debug] [input:tail:tail.0] 0 new files found on path '/tomcat-logs/platform*.log'

@gorexlv
Copy link

gorexlv commented May 26, 2023

Same issue, has this been resolved?

@IbrahimMCode
Copy link

Same issue using latest version

@LiadBerko
Copy link

I faced the same question. Since tail input only reads the last offset of the file - make sure you are actually writing to the correct files that you are reading from, that your filters are correct and that the lines are not too long (because you have Skip_Long_Lines turned on). You can check yourself by using Read_from_Head set to true under [INPUT]

@Varrkan82
Copy link

Any updates? Because the same thing happened on v2.1.10

@ardole
Copy link

ardole commented Oct 24, 2023

Still same issue here

@omidraha
Copy link

I Have the same issue fluent/helm-charts#415:

kubectl logs  -n fluent-bit -f fluent-bit-6341f45a-j8h2x | grep -i volume-test
[2023/10/27 17:04:05] [debug] [input:tail:tail.0] inode=1062251 with offset=2026 appended as /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log
[2023/10/27 17:04:05] [debug] [input:tail:tail.0] scan_glob add(): /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log, inode 1062251
[2023/10/27 17:04:05] [debug] [input:tail:tail.0] inode=1062251 file=/var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log promote to TAIL_EVENT
[2023/10/27 17:04:05] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1062251 watch_fd=23 name=/var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log
[2023/10/27 17:05:04] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log, inode 1062251
[2023/10/27 17:06:04] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log, inode 1062251
[2023/10/27 17:07:04] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log, inode 1062251
[2023/10/27 17:08:04] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/volume-test_exp_volume-test-2c25a3b1342924cffab9fffb48b2f0a971fcf9f10009e4c382a37bc09075134b.log, inode 1062251

@nagyzekkyandras
Copy link

nagyzekkyandras commented Jan 18, 2024

same here with version 2.2.1, have any solution?

@rnemeth90
Copy link

Is anyone actually going to fix this? This issue needs to be re-opened.

@wdonne
Copy link

wdonne commented May 16, 2024

I still have this with version 3.0.3.

@daniela-cyera
Copy link

Please re-open the issue, seems to be persistent in all recent versions

@jisorondo
Copy link

Can we please get an update on this issue? Thanks!

@Varrkan82
Copy link

Varrkan82 commented Jul 5, 2024

Switched to filebeat as fluent-bit unusable in production due to this issue. Sorry guys, but 3 years of nothing is a little strange... @edsiper maybe you need to look on this?

@leonardo-albertovich
Copy link
Collaborator

Hi folks, I have create this branch where I've added some trace level debug messages for some code paths that could slip through the cracks and I'm currently creating a set of container images that you'll be able to access in ghcr.io/fluent/fluent-bit/test/leonardo-master-issue_4014_debug_addition once the packaging process is complete.

I'd really appreciate it if those of you who are currently able to reproduce the issue took the time to either build from source or run those containers (setting log_level to trace) and share those logs with me (You can find me in the public slack server as Leonardo Alminana if you'd rather share that information privately) to expedite the troubleshooting process.

@leonardo-albertovich
Copy link
Collaborator

Quick update: the container images are already available so please, it's really important for at least someone to run them and report the results.

@rnemeth90
Copy link

I'll try to find some time this week to deploy the image @leonardo-albertovich . Where are the logs output to? Stdout/err? Or can I define a sink?

@leonardo-albertovich
Copy link
Collaborator

The logs are produced in the standard way so you don't need to do anything special (other than setting log_level to trace)

@srajappa
Copy link

srajappa commented Sep 19, 2024

I am seeing the same issue

@mohannara
Copy link

I'm still getting the below error. However, the issue was raised when multiline processing was enabled. what could be the issue?
image -> public.ecr.aws/aws-observability/aws-for-fluent-bit:2.32.2.20241008 (Fluent Bit v1.9.10)
[2024/10/16 04:03:58] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/master-ms-deployment-5db5876558-w5gjj_master-ms-e2927e4cfeb05a89f229d7622edaefc750e48568d65415e983980535031ddd41.log, inode 75503248

@liuchintao
Copy link

In my case, I found that if the file to be monitored is created after the tail plugin starts working, the tail plugin can read the file normally. However, if the file already exists when the tail plugin starts, it will cause the tail plugin to fail to read it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests