Tracker pool and dirty list #79

pzel · 2017-06-08T14:55:49Z

Tracker Pool & `dirty_list` API

Summary

This work is aimed at increasing the capacity of applications using
Phoenix Presence.

Specifically, it addresses two issues:

A single Tracker server becoming a bottleneck under high throughput
Tracker.list calls invoking GenServer.call to get a list of
presences

The first issue is resolved by starting a pool of named Tracker.Shards,
and dispatching calls to them based on the topic in question. The default
pool size is 1, and therefore the current behavior of Phoenix.Presence is
not affected.

The second issue is resolved by introducing a Tracker.dirty_list function
with the same API as Tracker.list, but with much less overhead and less
precise results.

Comments & Caveats

This likely addresses #71.

A corresponding change will need to be made to the Phoenix Framework in order
for the default Phoenix.Presence handler to expose the new
Tracker.dirty_list functionality. In the meantime, applications that need
to use dirty_list can implement it directly, as shown below:

# file: myapp/lib/myapp/presence_tracker.ex
#

defmodule MyApp.PresenceTracker do
  use Phoenix.Presence,
    otp_app: :myapp,
    pubsub_server: MyApp.PubSub,
    pool_size: 128

  def dirty_list(%Phoenix.Socket{topic: topic}), do: dirty_list(topic)
  def dirty_list(topic), do: dirty_list(__MODULE__, topic)

  def dirty_list(module, topic) do
    grouped =
      module
      |> Phoenix.Tracker.dirty_list(topic)
      |> group()
    module.fetch(topic, grouped)
  end

  defp group(presences) do
    presences
    |> Enum.reverse()
    |> Enum.reduce(%{}, fn {key, meta}, acc ->
      Map.update(acc, to_string(key), %{metas: [meta]}, fn %{metas: metas} ->
        %{metas: [meta | metas]}
      end)
    end)
  end
end

Graphs from the application running under target production load

The graphs compare the same scenario with different configurations of the proposed changes, the default setting being the one of them.

Pool Size Load Test Results:

150K channels connected via websocket;
A single channel lifetime uniformly distributed between 0 and 300 seconds;
Every channel subscribes to presence updates of 100 other channels.

At pool_size = 1, the track time tail latencies are much higher than the corresponding metric at pool_size = 36.

Pool size: 1

Pool size: 36

List Function Load Test Results

137.5K channels connected via websocket;
A single channel lifetime uniformly distributed between 0 and 300 seconds;
Every channel subscribes to presence updates of 500 other channels.

List times are dramatically lower when using 'dirty_list, due to the complete bypass of GenServer.call`s.

`Tracker.list` @ pool size = 128

`Tracker.dirty_list` @ pool size = 128

pzel · 2017-07-04T13:11:49Z

Hi @chrismccord!
We rebased this PR on top of your clouds work on the Tracker. We think it's ready to go -- please take a look when you find a moment.

Happy 4th! 🇺🇸

josevalim · 2017-09-29T08:28:19Z

Hi @pzel, let me hijack this thread briefly. I have just watched your excellent ElixirConf talk. At some point in the talk you mentioned the use of phx_requests for tracking subscription and I want to be sure I understood the design. Previously, you would join a channel for every friend you had. Then you changed it to have a single channel that subscribes to the topic of every friend. Is this correct?

josevalim · 2017-09-29T09:05:00Z

I just got to the questions part of the talk and Chris mentioned fastlaning and I would like to point out you can still fastlane if you pass the proper options on subscribe.

@chrismccord, it may be worth adding a subscribe function to channels that always pass the proper fastlane options. I feel like we have talked about it in the past. Any downsides?

OvermindDL1 · 2017-09-29T16:02:44Z

Then you changed it to have a single channel that subscribes to the topic of every friend.

That is what I do, prevents process-explosion. ^.^;

I just got to the questions part of the talk and Chris mentioned fastlaning and I would like to point out you can still fastlane if you pass the proper options on subscribe.

Ooo, I did not know about those options...

it may be worth adding a subscribe function to channels that always pass the proper fastlane options.

Please yes? :-)

chrismccord · 2017-09-30T03:28:27Z

@josevalim I implemented this but then reverted because the clients are not aware of the fastlaned topics. Broadcasts are matched on the client via the topic, so the only way to handle fastlaned messages would be to use socket.onMessage and try to parse out what special topics/events you are looking for. So including it as a generalized easy-to-use option option doesn't seem straightforward enough to me as a feature. As you noted, it's all still possible today by passing the options yourself if you have special handling needs. Are there other options for handling on the client that I'm missing?

chrismccord · 2017-09-30T03:29:52Z

I'll also note that the "subscribe to many topics via a single channel process" is definitely a pattern we promote for different usecases. Some references to this in the guides would be nice.

josevalim · 2017-09-30T04:05:46Z

@josevalim <https://github.com/josevalim> I implemented this but then

reverted because the clients are not aware of the fastlaned topics Does it mean the client discards them or does it mean the client is unable to know from where it actually came from? -- *José Valimwww.plataformatec.com.br <http://www.plataformatec.com.br/>Founder and Director of R&D*

chrismccord · 2017-09-30T16:24:25Z

Both but there is a socket.onMessage where you could try your own handling

…

Sent from my iPhone

On Sep 30, 2017, at 12:05 AM, José Valim ***@***.***> wrote: > @josevalim <https://github.com/josevalim> I implemented this but then reverted because the clients are not aware of the fastlaned topics Does it mean the client discards them or does it mean the client is unable to know from where it actually came from? -- *José Valimwww.plataformatec.com.br <http://www.plataformatec.com.br/>Founder and Director of R&D* — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

pzel · 2017-10-01T18:47:35Z

@josevalim: Chris pretty much summed up what we ran into.

We developed a fastlane module that does some sanitization, and plugged it in globally. It worked great, except for the fact that the topic was now hardcoded to the topic of the originator of the broadcast. The clients were not listening to these (they only reacted to messages coming from OurPrefix:MyId, their home-channel topic), and so we had to move back to fastlane: nil to maintain backward compatibility.

This is what our setup looked like before we had to revert:

supervisor(Phoenix.PubSub.PG2,
        [OurApp.PubSub, [fastlane: OurApp.PrivacyPreservingFastLane,
                         broadcast_strategy: Phoenix.PubSub.Strategy.Serial]]),

lucaspolonio · 2018-04-16T03:23:25Z

Hey @chrismccord, are there any plans to merge this PR? I'd love to see it released. Phoenix.Presence really suffers under high throughput right now and I believe this PR addresses most of the issues.

chrismccord · 2018-04-16T09:31:40Z

Yes, it is still on my plate!

chrismccord · 2018-07-18T21:45:21Z

Sorry this took so long to get merged in. Thanks so much!!! Note: I have decided to remove the dirty_list API for now so as not to increase the API surface area. Thank you! ❤️❤️❤️🐥🔥

pzel · 2018-07-18T22:51:45Z

Sweet! Hope people enjoy the speed benefits!

indrekj · 2019-06-21T14:25:23Z

@pzel do you have recommendations how to choose a pool size for the tracker?

Also, I understand dirty_list was removed from the tracker API. I still see it in the Shard module. Is it possible to still use it?

pzel · 2019-06-21T19:35:46Z

@indrekj I've been out of the loop regarding PubSub performance for a while now, so I can't make any recommendations about how to measure total system performance for your use case. However, I think that using a pool size equal to the number of CPUs on your target deployment machine is a reasonable choice. If you don't know this number, use 16 and see how the system behaves ;)

Regarding dirty_list, I think the only way to access it now is to keep a local fork of phoenix_pubsub and add
https://github.com/phoenixframework/phoenix_pubsub/pull/79/files#diff-fc3d1e8efd95176c339eaafe5ab24ad6R232 to the tracker source. Not elegant, I know, but I understand why the Phoenix project wouldn't want a 'broken' function in their api.

chasers · 2019-10-15T00:20:40Z

If anyone else needs this you don't need to fork it, just implement a dirty_list function in your app that looks like list, except use dirty_list. Like:

def dirty_list(tracker_name, topic) do
    pool_size = your_pool_size

    tracker_name
    |> Phoenix.Tracker.Shard.name_for_topic(topic, pool_size)
    |> Phoenix.Tracker.Shard.dirty_list(topic)
  end

indrekj · 2019-10-15T13:19:51Z

I also made a PR #127 which removes the need for dirty_list.

chasers · 2019-10-15T13:35:32Z

@indrekj I saw that! I was trying not to diverge from what's here, as I'm pretty new to all this. I hope they merge it! The whole thing is eventually consistent so why serialize the reads in the first place?

pzel force-pushed the tracker-pool-and-dirty-list branch 2 times, most recently from 10324f2 to 14e4512 Compare June 8, 2017 17:37

pzel force-pushed the tracker-pool-and-dirty-list branch from 14e4512 to 7206a18 Compare July 4, 2017 12:39

Tracker consists of shard-pool; exposes dirty_list operation

1c9ab71

pzel force-pushed the tracker-pool-and-dirty-list branch from 7206a18 to 1c9ab71 Compare July 4, 2017 12:53

chrismccord merged commit 1c9ab71 into phoenixframework:master Jul 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracker pool and dirty list #79

Tracker pool and dirty list #79

pzel commented Jun 8, 2017

pzel commented Jul 4, 2017

josevalim commented Sep 29, 2017

josevalim commented Sep 29, 2017

OvermindDL1 commented Sep 29, 2017

chrismccord commented Sep 30, 2017

chrismccord commented Sep 30, 2017

josevalim commented Sep 30, 2017 via email

chrismccord commented Sep 30, 2017 via email

pzel commented Oct 1, 2017

lucaspolonio commented Apr 16, 2018 •

edited

Loading

chrismccord commented Apr 16, 2018

chrismccord commented Jul 18, 2018

pzel commented Jul 18, 2018

indrekj commented Jun 21, 2019

pzel commented Jun 21, 2019

chasers commented Oct 15, 2019

indrekj commented Oct 15, 2019

chasers commented Oct 15, 2019

Tracker pool and dirty list #79

Tracker pool and dirty list #79

Conversation

pzel commented Jun 8, 2017

Tracker Pool & dirty_list API

Summary

Comments & Caveats

Graphs from the application running under target production load

Pool Size Load Test Results:

Pool size: 1

Pool size: 36

List Function Load Test Results

Tracker.list @ pool size = 128

Tracker.dirty_list @ pool size = 128

pzel commented Jul 4, 2017

josevalim commented Sep 29, 2017

josevalim commented Sep 29, 2017

OvermindDL1 commented Sep 29, 2017

chrismccord commented Sep 30, 2017

chrismccord commented Sep 30, 2017

josevalim commented Sep 30, 2017 via email

chrismccord commented Sep 30, 2017 via email

pzel commented Oct 1, 2017

lucaspolonio commented Apr 16, 2018 • edited Loading

chrismccord commented Apr 16, 2018

chrismccord commented Jul 18, 2018

pzel commented Jul 18, 2018

indrekj commented Jun 21, 2019

pzel commented Jun 21, 2019

chasers commented Oct 15, 2019

indrekj commented Oct 15, 2019

chasers commented Oct 15, 2019

Tracker Pool & `dirty_list` API

`Tracker.list` @ pool size = 128

`Tracker.dirty_list` @ pool size = 128

lucaspolonio commented Apr 16, 2018 •

edited

Loading