Optimize Shard.list and Shard.get_by_key #127
Open
+50
−39
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previous list and get_by_key had to go through GenServer to acquire
values ets table and replicas information. In case GenServer was
processing an update (e.g. heartbeat, track, untrack) then list and
get_by_key functions were blocked until it was completed. We saw this
behaviour in our cluster where simple list/get_by_key calls were
sometimes taking over few hundred milliseconds.
Storing replicas information in an ets table allows us to avoid going
through genserver and allows us to process list/get_by_key immediately.
I removed dirty_list function which was not public / exposed and which
was trying to resolve the same issue. dirty_list was called dirty
because it didn't check for down_replicas. This solution checks
down_replicas and doesn't change the api interface.
Update 2019/12/06: We've fully rolled this out to production (50K+ concurrent connections). We also got ~30% drop in CPU usage which I did not expect at all, but that's very good.
Update 2020/01/03: We've hit 70K+ concurrent connections. Everything still looking good.
Update 2021/06/13: Over 200K concurrent connections with this.
This should also resolve #124