Redis probes list issues - Multi-instance API problems #120

jimaek · 2022-05-12T12:14:41Z

This is a task to track the issue where sometimes an API instance shows a partial list of the connected probes.
It has happened in the simple CLI script and production API as well.

Need to verify that our API is 100% stable when running multiple instances with a central Redis DB.

patrykcieszkowski · 2022-07-26T16:00:41Z

heres the problem:

server A manages X probes
server B manages Y probes
the load balancer redirects user to A or B
at random, server A or B wouldnt respond to probe list query (pub/sub)

alexey-yarmosh · 2022-11-09T13:45:43Z

Hey @patrykcieszkowski, I am trying to address that issue and Dmitriy told me that you had a script to get a list of probes from the specific node instance/process. Could you share it please? If you don't mind.

Also, if you have any info on how the issue can be steadily reproduced, that would be very helpful. Thanks!

patrykcieszkowski · 2022-11-12T18:24:30Z

I don't recall writing such a script, but it should be as simple as adding a node identifier k/v to the probe data.

https://github.com/jsdelivr/globalping/blob/master/src/probe/builder.ts#L90-L105
https://github.com/jsdelivr/globalping/blob/master/src/probe/route/get-probes.ts#L16-L33

I also never figured out how to consistently replicate that issue. In fact, it never happened on my local network, even while running over 500 probes. One thing is for certain - even when connecting to the WS pool externally, and pulling the probe list while bypassing the HTTP server, the behaviour mentioned in the comment above was present. I came to the conclusion that some nodes would either never receive the pub requesting the data, or wouldn't respond to it on time (redis-adapter has a timeout).

alexey-yarmosh · 2022-11-30T08:44:18Z

I was constantly requesting /probes endpoint from both APIs and what I am observing is:

under usual load diff between responses may be ~1-2 probes, because some probes are constantly reconnecting (IP limit). Current fetchSockets adapter firstly gets local probes, then asks to get remote ones. While awaiting for remote, some local probes may disconnect, but we already got the list - that is why unsync happens.
under the high load, redis operations (and pub/sub that adapter is using) takes more time + there are more porbes reconnecting, that is why diff may be ~1-10 probes.
also, under the load there is issue timeout reached while waiting for fetchSockets response #234 in that case 500 http error is returned.
also, under the load sometimes both APIs simultaneously respond only with their own probes.

As I see, the only thing we can do here is to try another adapter implementation and compare the behaviour. For some teams AMPQ adapter showed really good results.

alexey-yarmosh · 2022-12-01T09:23:03Z

AMPQ adapter does not support some of the required operations (e.g. fetchSockets()).
I've also tried NATS adapter, but fetchSockets there also works not as expected.

alexey-yarmosh · 2023-01-03T16:06:45Z

I think we can close that, as under usual load GET /probes works without issues. Only under high load (when redis operations start to take >30 sec) we can observe the issues as well as 500 error. So we should focus on the root cause (redis perf) in other gh issues, which we are already doing.

jimaek assigned patrykcieszkowski and zarianec and unassigned patrykcieszkowski May 12, 2022

This comment was marked as outdated.

Sign in to view

jimaek mentioned this issue Jun 22, 2022

Probes not processing tests #145

Closed

jimaek mentioned this issue Jul 26, 2022

Architecture review #176

Closed

jimaek changed the title ~~Redis probes list issues~~ Redis probes list issues - Multi-instance API problems Aug 18, 2022

jimaek unassigned zarianec Aug 18, 2022

MartinKolarik added the high priority label Oct 2, 2022

jimaek assigned alexey-yarmosh Oct 7, 2022

patrykcieszkowski mentioned this issue Nov 17, 2022

timeout reached while waiting for fetchSockets response #234

Closed

alexey-yarmosh mentioned this issue Dec 1, 2022

Review the transport used across API instances #244

Closed

jimaek closed this as completed Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis probes list issues - Multi-instance API problems #120

Redis probes list issues - Multi-instance API problems #120

jimaek commented May 12, 2022 •

edited

Loading

This comment was marked as outdated.

patrykcieszkowski commented Jul 26, 2022

alexey-yarmosh commented Nov 9, 2022

patrykcieszkowski commented Nov 12, 2022

alexey-yarmosh commented Nov 30, 2022

alexey-yarmosh commented Dec 1, 2022 •

edited

Loading

alexey-yarmosh commented Jan 3, 2023 •

edited

Loading

Redis probes list issues - Multi-instance API problems #120

Redis probes list issues - Multi-instance API problems #120

Comments

jimaek commented May 12, 2022 • edited Loading

This comment was marked as outdated.

patrykcieszkowski commented Jul 26, 2022

alexey-yarmosh commented Nov 9, 2022

patrykcieszkowski commented Nov 12, 2022

alexey-yarmosh commented Nov 30, 2022

alexey-yarmosh commented Dec 1, 2022 • edited Loading

alexey-yarmosh commented Jan 3, 2023 • edited Loading

jimaek commented May 12, 2022 •

edited

Loading

alexey-yarmosh commented Dec 1, 2022 •

edited

Loading

alexey-yarmosh commented Jan 3, 2023 •

edited

Loading