Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage #116

Open
bbashy opened this issue Mar 23, 2024 · 27 comments
Open

High CPU usage #116

bbashy opened this issue Mar 23, 2024 · 27 comments
Assignees
Labels

Comments

@bbashy
Copy link

bbashy commented Mar 23, 2024

Reverb Version

v1.0.0-beta4

Laravel Version

11.0.8

PHP Version

8.2.17

Description

After a day or so, I'm getting 100% CPU usage on the reverb:start process. If I restart reverb, it goes down to 1% CPU.

It's currently doing 160 odd users online in presence channel and that's about it. Had the same problem with this package: beyondcode/laravel-websockets#379

Screenshot 2024-03-23 at 7 49 30 PM

Steps To Reproduce

Have a reverb server running via reverb:start for a day with many presence connections (150+).

@joedixon
Copy link
Collaborator

Hey @bbashy, how many messages are you sending across the connections and what sort of server are you running?

Also, are you running Reverb alongside another an application or standalone?

@bbashy
Copy link
Author

bbashy commented Mar 26, 2024

Hi.

Using it as a presence channel for online user list. Hit 180 users at certain times. It is used on most pages across the site, so people will be leaving/joining on page change. People visit a few pages every few seconds.

Using a Hetzner cloud server AMD EPYC (8c 16GB).

No other applications, this server is just for this site. Was using Soketi (no CPU issues) but it doesn't work on node v20, which is why I switched to Reverb after Laravel upgrade.

Echo.join('online-users')
    .here((users) => {
        this.onlineUsers = users;
    })
    .joining((user) => {
        this.onlineUsers.push(user);
    })
    .leaving((user) => {
        this.onlineUsers = this.onlineUsers.filter(u => u.id !== user.id);
    });

In channels.php it's just a return of user data. Username, country, link to profile.

It's been fine since I deployed last - 24/03/2024 22:00 UTC.

@joedixon
Copy link
Collaborator

Thanks for the reply. Going to close this for now, but please mention me in this thread if you see the issue again and I will reopen.

@bbashy
Copy link
Author

bbashy commented Mar 27, 2024

@joedixon Started happening again. If there's any debugging I can do, let me know.

Screenshot 2024-03-27 at 9 12 33 PM

@joedixon
Copy link
Collaborator

@bbashy Would be good to get a better picture of your environment and use case. Would you mind answering the following:

  1. Does the CPU continually increase over time or does it suddenly spike to 100%?
  2. If you leave it, does the CPU eventually drop back down?
  3. Are you using the Pulse integration?
  4. Do you have debug mode enabled?
  5. Are you broadcasting messages (if so, roughly how many) or just using for presence indication?
  6. Are you able to use tinker to run the code samplese below to get the current number of connections and channels:
// Connections
app(\Illuminate\Broadcasting\BroadcastManager::class)
  ->getPusher()
  ->get('/connections');

// Channels
app(\Illuminate\Broadcasting\BroadcastManager::class)
  ->getPusher()
  ->get('/channels', ['info' => 'subscription_count']);

@joedixon joedixon reopened this Mar 28, 2024
@bbashy
Copy link
Author

bbashy commented Mar 28, 2024

  1. I've only noticed it being 1% or 99-100%.
  2. It's still at that since my last reply (16 hours).
  3. No but I plan to once out of beta. Shall I install?
  4. No. Logs just show "INFO Stopping server on 0.0.0.0:2096. INFO Starting secure server on 0.0.0.0:2096 (example.com)." etc.
  5. We have a 'chat room' but it's hardly used.
  6. See below.
Psy Shell v0.12.2 (PHP 8.2.17 — cli) by Justin Hileman
> app(\Illuminate\Broadcasting\BroadcastManager::class)->getPusher()->get('/connections');
= {#5634
    +"connections": 117,
  }

> app(\Illuminate\Broadcasting\BroadcastManager::class)->getPusher()->get('/channels', ['info' => 'subscription_count']);
= {#5606
    +"channels": {#5612
      +"servers-booked": {#5610
        +"subscription_count": 14,
      },
      +"stats-counter": {#5605
        +"subscription_count": 14,
      },
      +"presence-public-chat": {#5607},
    },
  }

@joedixon
Copy link
Collaborator

No need to install Pulse, just wanted to rule it out as a factor.

Would be interesting to run with debug mode enabled for a little while (which in itself will increase CPU) to see if anything untoward is happening when the CPU spikes.

@bbashy
Copy link
Author

bbashy commented Mar 31, 2024

Running with debug now and I'm running a script to gather the cpu % every 2 seconds.

Will report here with data.

@bbashy
Copy link
Author

bbashy commented Apr 1, 2024

Gone up to 100% again, here's the cpu usage, pretty much instant, not gradual. Running in debug for 24 hours.

2024-04-01 20:33:46 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:33:48 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:50 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:33:52 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:54 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:56 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:33:58 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:00 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:02 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:04 :: reverb:start --debug [2282066] CPU Usage: 4%
2024-04-01 20:34:06 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:34:08 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:34:10 :: reverb:start --debug [2282066] CPU Usage: 4%
2024-04-01 20:34:12 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:14 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:34:16 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:18 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:34:20 :: reverb:start --debug [2282066] CPU Usage: 52%
2024-04-01 20:34:22 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:24 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:26 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:28 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:30 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:32 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:34 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:36 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:38 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:40 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:42 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:44 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:46 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:48 :: reverb:start --debug [2282066] CPU Usage: 100%

The reverb logs don't show much apart from maybe this? This is quite regular in the whole log even when it was low usage.

Message Handled ........................................ 285448861.320304079  
Connection Closed ...................................... 880659721.152382869  
Pruning Stale Connections ..................................................  
Pinging Inactive Connections ...............................................  
Connection Pinged ...................................... 373144331.352179112  
Connection Pinged ...................................... 770897394.824165909  
Connection Pinged ...................................... 128326687.479625382  
Connection Pinged ...................................... 238258499.280963052  
Connection Pinged ...................................... 711357617.733355114  
Connection Pinged ...................................... 913581555.461208734  
Connection Pinged ...................................... 750335547.483737098  
Connection Pinged ...................................... 824799629.748502904  
Connection Pinged ...................................... 373757589.569113713  
Connection Pinged ....................................... 88650420.659599424  
Connection Pinged ...................................... 875588527.671792201  
Connection Pinged ...................................... 645841583.993685622  
Connection Pinged ....................................... 69443263.770122343  
Connection Pinged ....................................... 969366386.39638130  
Connection Pinged ....................................... 97541188.835314754  
Connection Pinged ...................................... 336550205.253472927  
Connection Pinged ...................................... 837108525.856830003  
Connection Pinged ...................................... 843470688.187968972  
Connection Pinged ...................................... 903186882.544248154  
Connection Pinged ....................................... 70865328.772779885  

then after 170~ lines of that, it does loads of;

Message Received ....................................... 913581555.461208734  

1    {
2        "event": "pusher:ping",
3        "data": []
4    }

Message Handled ........................................ 913581555.461208734  
Message Received ......................................... 746315846.6165177  

1    {
2        "event": "pusher:ping",
3        "data": []
4    }

@joedixon
Copy link
Collaborator

joedixon commented Apr 4, 2024

@bbashy the logs look normal, but I'm interested to see if the issue is related to the pinging of connections which Reverb defaults to running every 60 seconds. Can you try increasing the value of REVERB_APP_PING_INTERVAL to see whether that makes any difference. Perhaps starting at 1 hour (3600) and increasing from there.

@bbashy
Copy link
Author

bbashy commented Apr 4, 2024

I've set REVERB_APP_PING_INTERVAL=3600 and I've restarted reverb. Back to 0-1% CPU and will run the monitoring for the process.

Must be something to do with the amount of pings along with the connections subscribe/unsubscribe.

I use it as an online list and it's included on most pages, so page change will be a disconnect/connect.

@joedixon
Copy link
Collaborator

joedixon commented Apr 4, 2024

Right, but I would say that's pretty standard usage. Perhaps the difference is the online list as that will cause events to be broadcast across all connected clients on every connection / disconnection. Pusher actually limit this functionality to 100 connections only.

@bbashy
Copy link
Author

bbashy commented Apr 4, 2024

Yep, I saw that and Soketi limit it too. I increased the limit on Soketi to 500 and the cpu usage was still below 10% even after a month of uptime.

The online list is only on the homepage though, other pages only have a counter, but I guess it still updates to all clients.

@bbashy
Copy link
Author

bbashy commented Apr 4, 2024

Gone up again. Shall I try a higher ping interval? It's weird it goes straight up 0-100% in 4 seconds.

2024-04-04 19:28:00 :: reverb:start --debug [237336] CPU Usage: 2%
2024-04-04 19:28:02 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:04 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:06 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:08 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:10 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:12 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:14 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:16 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:18 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:20 :: reverb:start --debug [237336] CPU Usage: 87%
2024-04-04 19:28:22 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:24 :: reverb:start --debug [237336] CPU Usage: 99%
2024-04-04 19:28:26 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:28 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:30 :: reverb:start --debug [237336] CPU Usage: 100%

If it's not something others have had and we can't resolve, I'll look into changing the online user list to something ws message based.

@joedixon
Copy link
Collaborator

joedixon commented Apr 5, 2024

It would be good to try increasing the ping interval to see if it has any impact on when the CPU spikes. Agree it's strange that it ramps up so quickly.

Are you in a position to be able to temporarily disable the online user list so we can categorically determine that is actually the issue?

@clue
Copy link

clue commented Apr 9, 2024

@bbashy Sudden CPU spikes are definitely not to be expected and if you've encountered the exact same error in both the original Laravel Websockets package and now Reverb, this might in fact indicate a problem in the underlying Ratchet/ReactPHP components. Similar reports have been posted in the past and we have some ideas what could cause this, but we've never been able to reproduce this locally, see ratchetphp/Ratchet#939 (comment) / ratchetphp/Ratchet#939 (comment). If this matches your issue and you can reproduce this locally, please reach out and I'm happy to take a look at this.

@bbashy
Copy link
Author

bbashy commented Apr 9, 2024

@clue Looks very similar and relates to my use-case of high connect/disconnect events. In theory, all we'd need to do is replicate a lot of users (100+) connecting/disconnecting.
It started looking like it was because of that after I saw the time it shot up. 19:30-20:30 is when my site has the most users on it, and they'll be doing a lot of page changes (disconnect/connect).

@Alex0Fury
Copy link

Alex0Fury commented May 21, 2024

@bbashy
I faced a similar problem. Supervisor is raised under the docker container, the reverb process is started there. This process was loaded at 100%, 500+ connections at the same time.

The solution turned out to be to add an extension as indicated in the documentation
Event Cycle

Dockerfile

FROM php:8.3.3-fpm-bullseye
#....
#....
RUN apt-get install openssl
RUN docker-php-ext-install sockets
RUN pecl channel-update pecl.php.net
RUN apt-get update && apt-get install -y libevent-dev

RUN printf "\n\n\n\n\nno\nyes\n\n" | pecl install event
RUN docker-php-ext-enable event

Also installed outside the host system ulimit -n 32766

in the supervisor settings supervisor.conf
minfds=10000

[supervisord]
nodaemon=true
chmod=0770
logfile=/var/log/supervisor/supervisord.log
pidfile=/var/run/supervisord.pid
minfds=10000

[program:laravel-websocket]
numprocs=8
process_name=%(program_name)s_%(process_num)02d
chmod=0770
command=php /var/www/html/artisan reverb:start
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/www/html/storage/logs/reverb.log

finally

image

@bbashy
Copy link
Author

bbashy commented May 21, 2024

Might have fixed an issue for you, but I believe mine was slightly different.

I have no problems when running it under Soketi on the same machine/ulimit.

@Alex0Fury
Copy link

finally

Возможно, я решил для вас проблему, но я думаю, что у меня было немного иначе.

I believe that our answers can help other developers to cope with such a problem.
And the owners of the repository may take into account some of our practices

@joedixon
Copy link
Collaborator

This is still on my radar, I just haven't had the chance to take a deep dive on it just yet.

@Alex0Fury
Copy link

Это все еще в моем поле зрения, просто у меня еще не было возможности глубоко погрузиться в него.

Yesterday I experimented with installing the package pecl install ev, which similarly saves the situation. In a production environment with 600+ connections, the load on the process has decreased. There is one running artisan reverb hanging in the artisan reverb:start

it is necessary to specify in php.ini extension=ev.so

image

@rawezhcode
Copy link

image

When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart.
On an VPS ( 8core, 16GB RAM ) there is this problem

@sneycampos
Copy link

When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart. On an VPS ( 8core, 16GB RAM ) there is this problem

Did you install the ev extension?

@rawezhcode
Copy link

image

When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart. On an VPS ( 8core, 16GB RAM ) there is this problem

In my case, the problem was solved after installing the ev extension.
Currently, more than 2,500 users connect without problems

@samehdoush
Copy link

I faced the same problem. I installed it on Forge Laravel using load balance 3 server
All servers have 100% cpu usage.
Is there any solution related to forge?

@season886
Copy link

season886 commented Sep 20, 2024

Reverb will automatically switch to an ext-uv powered loop when available. Does ext-event still support it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants