Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeouts/connection closed errors with benchmark tests #122

Open
demuskov opened this issue May 16, 2018 · 5 comments
Open

Timeouts/connection closed errors with benchmark tests #122

demuskov opened this issue May 16, 2018 · 5 comments

Comments

@demuskov
Copy link

Hi!
I'm trying to make env for handle, for example 5-7k connections. I'v read (#12), but do not able to reproduce that success story.

Socket/files limits is 500000 on the poxa-machine. Erlang and Elixir versions are (Install instructions from https://gist.github.com/rubencaro/6a28138a40e629b06470):

  • Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false]
  • Elixir 1.6.4 (compiled with OTP 19)

There is no errors on that side (console log contains just correct messages). But on the other side(s) (where benchmark started) I have errors:

Running with n = 600
** (EXIT from #PID<0.74.0>) an exception was raised:
    ** (MatchError) no match of right hand side value: {:error, :closed}
    ~/deps/websocket_client/src/websocket_client.erl:150: :websocket_client.receive_handshake/3
    ~/deps/websocket_client/src/websocket_client.erl:137: :websocket_client.websocket_handshake/2
    ~/deps/websocket_client/src/websocket_client.erl:89: :websocket_client.ws_client_init/7
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

or:

Running with n = 400
Terminated
Terminated
Terminated
Terminated
Terminated
Terminated
Terminated
** (EXIT from #PID<0.74.0>) an exception was raised:
    ** (MatchError) no match of right hand side value: {:error, :timeout}
        connect.exs:14: Worker.handle_info/2
        (stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
        (stdlib) gen_server.erl:686: :gen_server.handle_msg/6
        (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

I think, that I'm missing something and any help is appreciate.
Thanks.

@edgurgel
Copy link
Owner

Hey @demuskov thanks for opening this issue. I will give it a try on this benchmark again this weekend. Thanks for opening this issue

@demuskov
Copy link
Author

Thanks, Eduardo!
Btw, I think, that cowboy connection limits somehow exhausted. And if you trying to look through poxa web console - you eventually get unresponsive poxa endpoint.

PS. Now I'm trying to redesign your benchmark for the following case:

  • Three or four independent subscription source machines with 500-1000 processes
  • One publisher, that emit 10 messages

Thanks once more.

@demuskov
Copy link
Author

Hi! I'v practiced with benchmark-based code on a few machines with Ubuntu 16 (poxa server) and Mint 18 (poxa multiple clients). Network settings for Ubuntu adjusted for heavy-load web server. Mints - highest limits (fds - 500000).

Poxa on ubuntu used with several start options (daemon, console) - behavior was identical in all cases.

Two scripts:

  • publisher.exs - publish 10 msgs to the "channel" (Ubuntu)
  • connect.exs - creates & connects N processes to the poxa (http), than trying to make N subscriptions (Mint)

Distributed env behavior:

  • connect.exs always connects to the poxa server (20, 2000 or 20000 connections) - that's good
  • websocket subscriptions in a very rare cases successfully established for more than 1000 processes for the one test run (if subscription process takes less than a 6 secs - thats ok and all processes get their subscriptions, in other case - see below)
  • very often test run breaks with error "closed" just after 6 secs from start:
	** (MatchError) no match of right hand side value: {:error, :closed}
    	~/poxa-original/deps/websocket_client/src/websocket_client.erl:150: :websocket_client.receive_handshake/3
    	~/poxa-original/deps/websocket_client/src/websocket_client.erl:137: :websocket_client.websocket_handshake/2
    	~/poxa-original/deps/websocket_client/src/websocket_client.erl:89: :websocket_client.ws_client_init/7
    	(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

Consolidated env behavior (all run under ubuntu):

  • if N < 1000 - all going ok
  • if N > 1500 - just 2 or 3 times i'v get full functioning result, but for the other cases (95%) i'v get following - for _i ==1122 (+/-) processing stalled and timeout reported - stack trace exactly the same as in distributed case, but error is {:error, :timeout}
  • from that point poxa just accepts connections, but subscriptions was not processed for no one process, even for N=1 and poxa restart is the only cure (besides no differences where you then trying to run test suite - remotely or locally - poxa not functioning anymore until restart)

Thanks.

PS. Scripts below:

connect.exs.txt
publish.exs.txt

@edgurgel
Copy link
Owner

Ok so we need to find a way to replicate these issues you are finding. It could be just a network slowness caused by the kernel (some resource limit?). TBH I never had more needs than 10k connections with Poxa so I never really tested more than that. I can try to setup a digital ocean machine and try again. I remember having great success running poxa on Linux but terrible results with OS X for example

@demuskov
Copy link
Author

I do not think that we reach system limits at Ubuntu server. Consolidated env means that all processes - poxa and test suit ran on the single machine.

PS. Ubuntu server sysctl.conf:

net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.tcp_max_orphans = 65536
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_keepalive_time = 10
net.ipv4.tcp_keepalive_intvl = 1
net.ipv4.tcp_keepalive_probes = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_mem = 50576   64768   98152
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_orphan_retries = 0
net.ipv4.tcp_syncookies = 0
net.ipv4.netfilter.ip_conntrack_max = 16777216
net.netfilter.nf_conntrack_max = 16777216
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_congestion_control = htcp
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.route.flush = 1
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.lo.rp_filter = 0
net.ipv4.conf.eth0.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.lo.accept_source_route = 0
net.ipv4.conf.eth0.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_rfc1337 = 1
net.ipv4.ip_forward = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_echo_ignore_all = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 16384
net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
fs.inotify.max_user_watches = 16777216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants