Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd worker crashing on startup when connecting to Graylog #1479

Closed
sumith-aeropost opened this issue Jan 19, 2024 · 9 comments · Fixed by #1506
Closed

Fluentd worker crashing on startup when connecting to Graylog #1479

sumith-aeropost opened this issue Jan 19, 2024 · 9 comments · Fixed by #1506
Labels

Comments

@sumith-aeropost
Copy link

sumith-aeropost commented Jan 19, 2024

Describe the bug

We've installed Fluentd in our AWS EKS cluster, connecting to Graylog, and it was functioning well. However, two days ago, the fluentd worker unexpectedly crashed. Fluentd pod logs consistently display the following messages:

2024-01-19 04:20:50 +0000 [error]: #0 unexpected error error_class=NameError error="uninitialized constant GELF::Notifier::Fixnum"
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:65:in `level='
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:24:in `initialize'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `new'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/compat/call_super_mixin.rb:42:in `start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:203:in `block in start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:192:in `block (2 levels) in lifecycle'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `each'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `block in lifecycle'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:178:in `each'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:178:in `lifecycle'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:202:in `start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/engine.rb:248:in `start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/engine.rb:147:in `run'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:617:in `block in run_worker'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:962:in `main_process'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:608:in `run_worker'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/command/fluentd.rb:372:in `<top (required)>'
  2024-01-19 04:20:50 +0000 [error]: #0 <internal:/usr/local/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
  2024-01-19 04:20:50 +0000 [error]: #0 <internal:/usr/local/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/bin/fluentd:15:in `<top (required)>'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/bin/fluentd:25:in `load'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/bin/fluentd:25:in `<main>'
2024-01-19 04:20:50 +0000 [error]: Worker 0 exited unexpectedly with status 1
Logs from 19/01/2024, 09:50:44

Any help would be appreciated on how we could fix this, can give further logs/code if necessary.

To Reproduce

Fluentd Pod logs

2024-01-19 04:20:50 +0000 [error]: #0 unexpected error error_class=NameError error="uninitialized constant GELF::Notifier::Fixnum"
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:65:in `level='
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:24:in `initialize'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `new'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/compat/call_super_mixin.rb:42:in `start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:203:in `block in start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:192:in `block (2 levels) in lifecycle'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `each'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `block in lifecycle'

Expected behavior

fluentd needs to connect graylog instance. It was working fine for long time, suddenly crashed.

Your Environment

- Tag of using fluentd-kubernetes-daemonset: v1-debian-graylog

Your Configuration

fluentd.yaml


#ref: https://github.com/fluent/fluentd-kubernetes-daemonset (fcdf045)

# create an identity for fluentd
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: kube-system

# grant fluentd permissions to read, list, and watch pods and namespaces in Kubernetes cluster
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluentd
  namespace: kube-system
rules:
  - apiGroups:
      - ""
    resources:
      - pods
      - namespaces
    verbs:
      - get
      - list
      - watch

# bind the fluentd ServiceAccount to these permissions using the ClusterRoleBinding
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluentd
roleRef:
  kind: ClusterRole
  name: fluentd
  apiGroup: rbac.authorization.k8s.io
subjects:
  - kind: ServiceAccount
    name: fluentd
    namespace: kube-system

# deploy fluentd DaemonSet
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
    version: v1
spec:
  selector:
    matchLabels:
      k8s-app: fluentd-logging
      version: v1
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        k8s-app: fluentd-logging
        version: v1
    spec:
      serviceAccount: fluentd
      serviceAccountName: fluentd
      # Enable tolerations if you want to run daemonset on master nodes.
      # Recommended to disable on managed k8s.
      # tolerations:
      # - key: node-role.kubernetes.io/master
      #   effect: NoSchedule
      containers:
        - name: fluentd
          image: fluent/fluentd-kubernetes-daemonset:v1-debian-graylog
          imagePullPolicy: IfNotPresent
          env:
            - name: FLUENT_GRAYLOG_HOST
              value: "log.int.*****.com"
            - name: FLUENT_GRAYLOG_PORT
              value: "12208"
            - name: FLUENT_GRAYLOG_PROTOCOL
              value: "udp"
            - name: FLUENTD_SYSTEMD_CONF
              value: "disable"
          resources:
            requests:
              cpu: 200m
              memory: 0.5Gi
            limits:
              # ===========
              # Less memory leads to child process problems.
              cpu: 1000m
              memory: 1Gi
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
          securityContext:
              privileged: true
      terminationGracePeriodSeconds: 30
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers


### Your Error Log

```shell
2024-01-19 04:20:50 +0000 [error]: #0 unexpected error error_class=NameError error="uninitialized constant GELF::Notifier::Fixnum"
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:65:in `level='
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/gelf-3.0.0/lib/gelf/notifier.rb:24:in `initialize'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `new'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-gelf-hs-1.0.8/lib/fluent/plugin/out_gelf.rb:52:in `start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/compat/call_super_mixin.rb:42:in `start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:203:in `block in start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:192:in `block (2 levels) in lifecycle'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `each'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:191:in `block in lifecycle'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:178:in `each'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:178:in `lifecycle'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/root_agent.rb:202:in `start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/engine.rb:248:in `start'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/engine.rb:147:in `run'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:617:in `block in run_worker'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:962:in `main_process'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:608:in `run_worker'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/lib/fluent/command/fluentd.rb:372:in `<top (required)>'
  2024-01-19 04:20:50 +0000 [error]: #0 <internal:/usr/local/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
  2024-01-19 04:20:50 +0000 [error]: #0 <internal:/usr/local/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/gems/fluentd-1.16.3/bin/fluentd:15:in `<top (required)>'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/bin/fluentd:25:in `load'
  2024-01-19 04:20:50 +0000 [error]: #0 /fluentd/vendor/bundle/ruby/3.2.0/bin/fluentd:25:in `<main>'
2024-01-19 04:20:50 +0000 [error]: Worker 0 exited unexpectedly with status 1
Logs from 19/01/2024, 09:50:44


### Additional context

_No response_
@sumith-aeropost sumith-aeropost changed the title Fluentd worker crashing on startup when trying to connect to Graylog Fluentd worker crashing on startup when connecting to Graylog Jan 19, 2024
@AleksanderGrzybowski
Copy link

Hi, I just got the same error when using this image. I'm not a Ruby programmer, but I've read somewhere that Fixnum class is deprecated. Maybe there is some Ruby version or GELF plugin version mismatch? If you check https://github.com/graylog-labs/gelf-rb/blob/master/lib/gelf/notifier.rb then you'll see there is Integer there. But the code in container is using Fixnum.

I'll try to update stuff in image to newest versions in custom Dockerfile. Maybe this will do the trick.

@AleksanderGrzybowski
Copy link

I've managed to work around this issue via the following Dockerfile + setting LD_PRELOAD="" to fix some other issue. This works for me:
RUN gem install gelf RUN gem install fluent-plugin-gelf-hs

@kemalceng
Copy link

kemalceng commented Mar 11, 2024

I've managed to work around this issue via the following Dockerfile + setting LD_PRELOAD="" to fix some other issue. This works for me: RUN gem install gelf RUN gem install fluent-plugin-gelf-hs

gem install gelf fluent-plugin-gelf-hs worked for us too. The difference was 3.1.0 version of gelf instead of 3.0.0. Manually changed the version in Gemfile used by docker image and it worked.

@mszyzdek
Copy link

This is the chain of related events that led to the disaster:

Good news is that unlucky Fixnum was removed in last 3.0.1 gelf gem version on commit that should prepare it to ruby 2.4 deprecation:
graylog-labs/gelf-rb@7cc3cbb
so maybe all to do is to bump up gelf version in Gemfile.erb in this project

Copy link

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

@github-actions github-actions bot added the stale label Jun 12, 2024
@daipom
Copy link
Contributor

daipom commented Jun 15, 2024

Is this issue resolved?
Should we update some dependencies?
Looks like the gemspec of fluent-plugin-gelf-hs has no problem.

@github-actions github-actions bot removed the stale label Jun 15, 2024
kenhys added a commit to kenhys/fluentd-kubernetes-daemonset that referenced this issue Jul 11, 2024
Closes: fluent#1479

Feedback from @mszyzdek
ref. fluent#1479 (comment)

* graylog flavour of fluentd-kubernetes-daemonset uses gelf 3.0.0 and
  this version of gelf gem has Fixnum in code,
  graylog-labs/gelf-rb@7cc3cbb
* in ruby 3.2 Fixnum was removed after previous deprecation in version 2.4
  https://www.ruby-lang.org/en/news/2022/12/25/ruby-3-2-0-released/
* ruby in newest fluentd was upgraded to 3.2
  fluent/fluentd-docker-image@4f1d5e8
  so it also happened in fluentd-kubernetes-daemonset
* gelf 3.0.0 cannot work with ruby 3.2+ so we can see sad error on container start

Signed-off-by: Kentaro Hayashi <[email protected]>
kenhys added a commit to kenhys/fluentd-kubernetes-daemonset that referenced this issue Jul 11, 2024
Closes: fluent#1479

Feedback from @mszyzdek
ref. fluent#1479 (comment)

* graylog flavour of fluentd-kubernetes-daemonset uses gelf 3.0.0 and
  this version of gelf gem has Fixnum in code,
  graylog-labs/gelf-rb@7cc3cbb
* in ruby 3.2 Fixnum was removed after previous deprecation in version 2.4
  https://www.ruby-lang.org/en/news/2022/12/25/ruby-3-2-0-released/
* ruby in newest fluentd was upgraded to 3.2
  fluent/fluentd-docker-image@4f1d5e8
  so it also happened in fluentd-kubernetes-daemonset
* gelf 3.0.0 cannot work with ruby 3.2+ so we can see sad error on container start

Signed-off-by: Kentaro Hayashi <[email protected]>
@daipom
Copy link
Contributor

daipom commented Jul 11, 2024

Oh, I see.
#219 fixed the version of gelf to 3.0.0, so the image installs gelf 3.0.0, currently.

Do we not need #219 anymore?
If so, we should revert #219.

@daipom
Copy link
Contributor

daipom commented Jul 12, 2024

Do we not need #219 anymore?

Until Fluentd v1.8.0, gelf-rb 3.1.0 was causing a severe error that Fluentd could not start.
The problem was partially fixed by fluent/fluentd#2709 (Fluentd v1.8.0).
Since Fluentd v1.8.0, Fluentd can start correctly with gelf-rb 3.1.0.

However, gelf-rb 3.1.0 still breaks Fluentd's config parser.
It causes an error that Fluentd cannot reload config by SIGUSR2.

So, we still need #219.

Please note that if you are updating gelf-rb manually, reloading by SIGUSR2 is not possible.

@daipom
Copy link
Contributor

daipom commented Jul 12, 2024

gelf-rb is no longer maintained.

graylog-labs/gelf-rb#93 (comment)

It would be better to use gelf_redux.
(graylog-labs/gelf-rb#93 (comment))

@kenhys kenhys closed this as completed in f07ee2b Jul 12, 2024
kenhys added a commit to kenhys/fluentd-kubernetes-daemonset that referenced this issue Jul 12, 2024
Closes: fluent#1479

Feedback from @mszyzdek
ref. fluent#1479 (comment)

* graylog flavour of fluentd-kubernetes-daemonset uses gelf 3.0.0 and
  this version of gelf gem has Fixnum in code,
  graylog-labs/gelf-rb@7cc3cbb
* in ruby 3.2 Fixnum was removed after previous deprecation in version 2.4
  https://www.ruby-lang.org/en/news/2022/12/25/ruby-3-2-0-released/
* ruby in newest fluentd was upgraded to 3.2
  fluent/fluentd-docker-image@4f1d5e8
  so it also happened in fluentd-kubernetes-daemonset
* gelf 3.0.0 cannot work with ruby 3.2+ so we can see sad error on container start

NOTE: Even though gelf was updated to 3.1.0, still it has problem with
reloading Fluentd configuration with SIGUSR2. This is known issue
since Fluentd 1.8.0.
ref.  fluent/fluentd#2709

Signed-off-by: Kentaro Hayashi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants