Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout issue #58

Open
mareban opened this issue Jun 2, 2023 · 2 comments
Open

Timeout issue #58

mareban opened this issue Jun 2, 2023 · 2 comments

Comments

@mareban
Copy link

mareban commented Jun 2, 2023

Hello,

We are using fluent-plugin-remote_syslog to forward session data events of a remote access tool based on an IP field to forward events to a dedicated syslog servers depending on the ip (IP plan) ! In other words, we have several sites, we can connect to these sites using a remote access solution, and want to forward sessions details to the syslog server of the site accessed !

If one of the sites is down, fluentd seems to be blocked and try to connect indefinitively to the site , and nothing else are forwarded, despite a timeout parameter is set !?

Is it a bug :-(, and if it is, do you plan to fix it please ?

If a timeout occured for a site, and if the timeout works, what happen to the events of the site unreachable, are they lost or still buffered and resend when the syslog server of the site is up again ?

Thanks for your help.

@daipom
Copy link
Contributor

daipom commented Jun 4, 2023

Hi, thanks for your report.

We are using fluent-plugin-remote_syslog to forward session data events of a remote access tool based on an IP field to forward events to a dedicated syslog servers depending on the ip (IP plan) ! In other words, we have several sites, we can connect to these sites using a remote access solution, and want to forward sessions details to the syslog server of the site accessed !

I want to know the setting of the plugin.
Do you use a placeholder feature for host to send to different servers depending on log contents?

If one of the sites is down, fluentd seems to be blocked and try to connect indefinitively to the site , and nothing else are forwarded, despite a timeout parameter is set !?

Do you mean a timeout parameter doesn't work as expected?
What parameter do you use?

Is it a bug :-(, and if it is, do you plan to fix it please ?

If it becomes clear that it is a bug in this plugin, I want to fix it.
On the other hand, it is possible that the problem is not a bug in this plugin, but a problem with TCP or other specifications.
We need to clarify where the problem lies.

To fix the problems, I want to simplify each problem so that it can be reproduced in general.

If a timeout occured for a site, and if the timeout works, what happen to the events of the site unreachable, are they lost or still buffered and resend when the syslog server of the site is up again ?

This is a difficult problem.
We need to consider this in terms of both TCP (Do you use TCP?) and the plugin's specifications.

In terms of the plugin specification, if a send fails, the plugin will try to resend according to the buffer retry settings.

However, in terms of TCP, there are some known problems.
In TCP, it is necessary to send a FIN to each other before stopping, but often the server side stops one-sidedly before the client sends a FIN.
(The client side should also close the socket and send back the FIN as soon as it receives the FIN, but I don't think it is often implemented (not even in this plugin).)

In such a situation, it is possible that the program successfully sent the data (

sender.transmit(msg.chomp!, packet_options)
), but in fact, the data was not sent.

A similar problem is reported in https://stackoverflow.com/questions/11436013/writing-to-a-closed-local-tcp-socket-not-failing.
This problem seems to be not limited to a specific programming language.

It is also talked about in #56.

@mareban
Copy link
Author

mareban commented Jun 6, 2023

Hi,

Thx for your reply :-) !

Yes we are using a placeholder feature for host and redirect the last 24 hours events to a specific site based on an IP plan !

We are using the timeout parameter , do we need to use others like tcp_keep alive as we are using TCP protocol because message length can be greater than 1024 bytes ?

Sometimes a syslog server can be down, sometimes the server on the sit can be decommisionned, sometimes it can be a filtering issue from the firewall, or a connection lost for whaever reason !

If there is a communication issue, are the events still kept in the buffer, or is it removed (timeout and retries) and go to the next event for the unreachable/down site, or maybe to the another site if the next event is an event to forward to this other site ?

Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants