-
-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add digest scores for faster deletes in sorted sets #835
Add digest scores for faster deletes in sorted sets #835
Conversation
e2e1b02
to
565f1c4
Compare
Let's think about this for a while. We already store a score when the digest is created. It has both a score and a job_id. We could simplify this by looking up the current digest's score and job id. We also have this issue: Sidekiq only checks with the heartbeat method every ten seconds. This means that for ten seconds, things can be missing. |
Ah, I wasn't aware we were already storing the score. I looked around but I guess I missed it. That makes things easier.
What does Sidekiq's heartbeat system do? I'm not familiar with it and what "missing" means here. |
565f1c4
to
60e380f
Compare
Both of your comments can be explained here: https://github.com/mhenrixon/sidekiq-unique-jobs/pull/830/files. Here, I am optimizing the ruby reaper to use the digest score ( There are some linked issues (I hope) that talk about this, and somewhere, there is a link to a sidekiq issue where @mperham explains this. We could apply similar thinking to your issue. EDIT: @ezekg this is the sidekiq issue I was talking about: sidekiq/sidekiq#6153 |
664c259
to
c1f52e8
Compare
@mhenrixon I added a rudimentary performance test that fails for the old algorithm. For a
To this:
|
851a8f5
to
bed17eb
Compare
I really appreciate you taking a stab at this @ezekg! You spotted a couple of bugs!
Can you check if the linked PR fixes your problem, too? I prefer a simpler fix than adding these extra keys. If it still isn't good enough, I'm happy to pair on sorting this out! |
bed17eb
to
203f087
Compare
If you have a look here: https://github.com/mhenrixon/sidekiq-unique-jobs/blob/main/lib/sidekiq_unique_jobs/lua/lock.lua#L66 we already add a score for the digest, we also store the timestamp in he hash with digest + job_id (this is to be able to allow concurrent jobs of a specified number): https://github.com/mhenrixon/sidekiq-unique-jobs/blob/main/lib/sidekiq_unique_jobs/lua/lock.lua#L70 Between those two, it is a hard sell to add more timestamps. I am more inclined to attempt to reduce the number of commands than to increase them. Could you find a way to make do with what is there or do a great job at selling the extra key to me @ezekg? I want to help you, I really do! |
203f087
to
13d562f
Compare
13d562f
to
f53f91c
Compare
@mhenrixon the current implementation does not store the actual score of the job in the sorted set (i.e. the timestamp at which a scheduled job is scheduled to run at), but rather the current time at which the job is added to the sorted set. I updated the implementation to now store the job's score when available. I needed this to match the job's score in Sidekiq's Let me know when you have time to review the new approach. I don't know if the previous timestamp was actually used for anything, but it didn't look like it at first glance. |
2548e5c
to
0fb19b6
Compare
@@ -62,8 +63,16 @@ if lock_type == "until_expired" and pttl and pttl > 0 then | |||
log_debug("ZADD", expiring_digests, current_time + pttl, digest) | |||
redis.call("ZADD", expiring_digests, current_time + pttl, digest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this alone because I don't fully understand what the expiring digests set is used for and if this optimization would be applicable to jobs utilizing an until_expired
lock strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I had in mind. Sorry that I haven't been able to get to it.
Your assumption about the job score in the hash is correct, I just wanted something in the hash and wasn't clear with how to use it. It was one of those: "I believe I will need this".
As soon as the German IRS is off my back (they want money I don't have), I'll see about greatly optimizing the gem.
I never got around to looking at the performance.
Perhaps this would be better solved in ruby (like the reaper).
I'm definitely not opposed to use some batching from the ruby layer if there are more than n number in a sorted set.
I believe there are plenty to optimize and for the performance tests I need to remember that my machine is as fast as a laptop comes.
Not fair to compare locally, should probably write a bunch of these performance tests and have them run on GitHub actions.
Do you want me to loop in the changes from #837? That had improvements to |
That would be lovely, I forgot about the queue! I'm too exhausted to cut a release and test this tonight but I will do it first thing tomorrow morning. Much appreciated!! |
11f17a8
to
8131bed
Compare
Done. Let me know what you find whenever you have a chance to test. |
8131bed
to
5da7d08
Compare
Looks like it will do the job just fine! Can't wait to optimize everything! |
Closes #668. First pass on my idea of using scores to "skip ahead" so that we don't have to iterate the entire sorted set when a unique job is deleted. Let me know what you think. I'm sure I'm missing some edge cases, since I don't have the domain knowledge you do here. This should be backwards compatible, reverting back to previous behavior if a score doesn't exist.
I still want to add a performance test with a large schedule queue to make sure this actually works.