Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Reload" and "systemctl restart" made different behavior for fluent-bit thread #8067

Closed
ym11369 opened this issue Oct 20, 2023 · 9 comments
Closed

Comments

@ym11369
Copy link

ym11369 commented Oct 20, 2023

Bug Report

Describe the bug
Found some issue during my test which try to make some indentation error in fluent-bit.conf or syntax error in reference lua file. The status of fluent-bit thread is different between 'reload' and 'systemctl restart'. Is this an expected behavior? Reload usually didn't kill the thread even have errors. Systemctl restart will fail to restart fluent-bit if any indentation error/syntax error.

Details:

  1. Tried to deliberately make some indentation error in fluent-bit.conf

Reload:
Reload post request returned 200 OK and response was {"reload":"done","status":0}. The output of systemctl status fluent-bit looks no error's during this reload. fluent-bit thread alive. And it seems used old-cached fluent-bit.conf to run.
image

And we can find the error in /var/log/syslog
image

Restart
But when we run systemctl restart fluent-bit the fluent-bit failed to be started. Fluentbit thread killed
image

/var/log/syslog
image

  1. Tried to deliberately make some syntax error in reference lua file

Reload:

The output of systemctl status fluent-bit show errors [luajit] error loading script: //etc/fluent-bit/modify_user_record.lua:80: 'end' expected (to close 'for' at line 5) near '<eof>' as expected. But fluent-bit thread looks alive. But the 2020 port was not work anymore.
image

Consistent with /var/log/syslog
image

Restart
Same with 1, just different error msg, thread killed same with 1

image

To Reproduce

  • Steps to reproduce the problem:
  1. Start fluent bit
  2. make some indentation error or syntax error and then tried to call reload/systemctl restart
  3. check the status of fluent-bit thread and error message

Expected behavior

Reload should have same behavior with systectl restart, killed thread and exit if have error

Your Environment

  • Version used: 2.1.10
  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?):
  • Server type and version:
  • Operating System and version: Ubuntu 20.04.6 LTS
  • Filters and plugins: Upload the configurations that I use.

config.zip

@patrick-stephens
Copy link
Contributor

Reload is internal and presumably rolls back to previous config in memory @cosmo0920 ?
Restarting the process will mean it uses whatever is on disk.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Oct 20, 2023

Yes. Reload should be willing to be alive the old valid context as much as possible. Restarting should be halting Fluent Bit process when the malformed/invalid configuration is used.

@ym11369
Copy link
Author

ym11369 commented Oct 23, 2023

So the reload in fluent-bit was not designed for letting new configurations/reference files change take effect? As you can see, when i tried to deliberately make some syntax error in reference lua file, the fluent-bit can't work properly after reload, is this by design?

@cosmo0920
Copy link
Contributor

cosmo0920 commented Oct 23, 2023

So the reload in fluent-bit was not designed for letting new configurations/reference files change take effect?

Reload in fluent-bit is not designed for letting invalid/malformed configurations take effect. These should be halted and refused loading. But, it's not expected behavior when HTTP interface is not working with it after using malformed lua files.

@patrick-stephens
Copy link
Contributor

We should probably document it a bit though so if you can submit a docs PR @ym11369 that would be helpful.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Oct 31, 2023

@patrick-stephens Once I thought this shouldn't be an issue. But, I diagnosed and wrote a patch to plug this case.
We already have preventing to reload invalid parameter provided case.
However, we didn't have a capability to verify invalid Lua script loaded during hot-reloading. This should be treated as a error. I wrote a patch to implement this strategy in #8110.

@patrick-stephens patrick-stephens removed not-an-issue docs issue Documentation Issue labels Oct 31, 2023
@patrick-stephens
Copy link
Contributor

I think this should be in 2.2.0 now @cosmo0920 / @ym11369

@cosmo0920
Copy link
Contributor

cosmo0920 commented Nov 9, 2023

Yup. We plugged the case of making an invalid status of flb_ctx. So, could you try Fluent Bit 2.2 then?

@patrick-stephens
Copy link
Contributor

@ym11369 please verify and close if solved, otherwise this will be auto-closed fairly soon.

@ym11369 ym11369 closed this as completed Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants