Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious bursts of errors connecting to logz.io #4

Closed
edevil opened this issue Aug 3, 2016 · 7 comments
Closed

Spurious bursts of errors connecting to logz.io #4

edevil opened this issue Aug 3, 2016 · 7 comments
Assignees

Comments

@edevil
Copy link
Contributor

edevil commented Aug 3, 2016

Periodically I get bursts of errors from this plugin:

2016-08-03 11:54:19 +0000 [error]: Error connecting to logzio. verify the url https://listener.logz.io:8071?token=...
2016-08-03 11:56:47 +0000 [error]: Error connecting to logzio. verify the url https://listener.logz.io:8071?token=...
2016-08-03 11:58:54 +0000 [error]: Error connecting to logzio. verify the url https://listener.logz.io:8071?token=...
2016-08-03 12:01:12 +0000 [error]: Error connecting to logzio. verify the url https://listener.logz.io:8071?token=...
2016-08-03 12:03:39 +0000 [error]: Error connecting to logzio. verify the url https://listener.logz.io:8071?token=...
2016-08-03 12:15:59 +0000 [error]: Error connecting to logzio. verify the url https://listener.logz.io:8071?token=...
2016-08-03 12:18:16 +0000 [error]: Error connecting to logzio. verify the url https://listener.logz.io:8071?token=...
2016-08-03 12:20:24 +0000 [error]: Error connecting to logzio. verify the url https://listener.logz.io:8071?token=...
2016-08-03 12:22:31 +0000 [error]: Error connecting to logzio. verify the url https://listener.logz.io:8071?token=...
2016-08-03 12:24:38 +0000 [error]: Error connecting to logzio. verify the url https://listener.logz.io:8071?token=...

I see logs on logz.io, so obviously this is not a permanent problem, but it would help if more specific information can be provided on the error. Is it a problem connecting to the remote host? A bad response from the server? A persistent connection that has failed?

Thanks.

@roiravhon roiravhon self-assigned this Aug 4, 2016
@roiravhon
Copy link
Contributor

Hey @edevil!

This log is printing every time there is an exception while sending logs.
We assumed it can only happen if there is a bad hostname (resulting in getaddrinfo: Name or service not known) - but obviously, we were wrong.

I just pushed another version of the plugin (0.0.5) to RubyGems that prints the exception instead of this not so informative message.

Can you please update your plugin version and comment here again with the exception?

Thanks,
Roi.

@edevil
Copy link
Contributor Author

edevil commented Aug 4, 2016

2016-08-04 13:29:20 +0000 [debug]: plugin/out_logzio_buffered.rb:44:write: Got flush timeout, containing 409 chunks
2016-08-04 13:29:20 +0000 [debug]: plugin/out_logzio_buffered.rb:54:write: HTTP Response code 200
2016-08-04 13:29:30 +0000 [debug]: plugin/out_logzio_buffered.rb:44:write: Got flush timeout, containing 16 chunks
2016-08-04 13:31:37 +0000 [error]: plugin/out_logzio_buffered.rb:93:rescue in write: Error connecting to logzio. Got exception: too many connection resets (due to Connection timed out - connect(2) for "listener.logz.io" port 8071 - Errno::ETIMEDOUT) after 0 requests on 33641960, last used 1470317497.9291167 seconds ago
2016-08-04 13:31:37 +0000 [debug]: plugin/out_logzio_buffered.rb:44:write: Got flush timeout, containing 182 chunks
2016-08-04 13:31:39 +0000 [debug]: plugin/out_logzio_buffered.rb:54:write: HTTP Response code 200

I should add that this fluentd daemon is running as a daemonset on my Kubernetes cluster hosted on Azure, so there can be some aggressive pruning of idle persistent connections, but Net::HTTP::Persistent is supposed to have a 5s timeout on those...

@roiravhon
Copy link
Contributor

That seems to be a known issue without a specific solution.
drbrain/net-http-persistent#37
drbrain/net-http-persistent#19

Can you try and change the "flush_interval" parameter?
try to decrease it to 2 seconds, or increase to 30 seconds and see if it's related somehow.

@edevil
Copy link
Contributor Author

edevil commented Aug 4, 2016

I used tcpdump and I actually see connection attempts (SYN) not being answered from listener.logz.io:8071. Some kind of DoS protection perhaps?

1536 140.689680910    10.2.85.6 ? 52.21.71.179 TCP 76 46088?8071 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=160856420 TSecr=0 WS=128
1537 140.689680910    10.2.85.6 ? 52.21.71.179 TCP 76 [TCP Out-Of-Order] 46088?8071 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=160856420 TSecr=0 WS=128
1538 140.689713610    10.0.3.11 ? 52.21.71.179 TCP 76 46088?8071 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=160856420 TSecr=0 WS=128
1539 141.691624701    10.2.85.6 ? 52.21.71.179 TCP 76 [TCP Retransmission] 46088?8071 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=160857422 TSecr=0 WS=128
1540 141.691624701    10.2.85.6 ? 52.21.71.179 TCP 76 [TCP Retransmission] 46088?8071 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=160857422 TSecr=0 WS=128
1541 141.691672301    10.0.3.11 ? 52.21.71.179 TCP 76 [TCP Retransmission] 46088?8071 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=160857422 TSecr=0 WS=128
1542 143.693602500    10.2.85.6 ? 52.21.71.179 TCP 76 [TCP Retransmission] 46088?8071 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=160859424 TSecr=0 WS=128
1543 143.693602500    10.2.85.6 ? 52.21.71.179 TCP 76 [TCP Retransmission] 46088?8071 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=160859424 TSecr=0 WS=128
1544 143.693632599    10.0.3.11 ? 52.21.71.179 TCP 76 [TCP Retransmission] 46088?8071 [SYN] Seq=0 Win=28200 Len=0 MSS=1410 SACK_PERM=1 TSval=160859424 TSecr=0 WS=128

@roiravhon
Copy link
Contributor

@noniperi Can kick in?

@edevil
Copy link
Contributor Author

edevil commented Aug 4, 2016

This seems to be related to opening connections in a small amount of time. Using a flush_interval of 2s and a constant stream of log traffic I haven't yet encountered problems since it keeps using the same connection. This is because the default value of Net:HTTP:Persistent:idle_timeout 5, so the connection is never considered idle.

Using my previous flush_interval of 10s kept recycling the connection since it was idle for more than "idle_timeout". Is there a way to set this "idle_timeout" via configuration of the plugin?

@roiravhon
Copy link
Contributor

Version 0.0.6 is live in RubyGems.org

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants