[Q&A] Performance & Dead Locks #657

OskarEichler · 2021-11-11T06:56:20Z

OskarEichler
Nov 11, 2021

Hey guys,

First up great work on this gem! We are very interested in deploying this on our production setup but have two concerns / questions we would like to clarify:

Performance

We are processing around 10 Million Jobs per day, and are using a lot of push_bulk to keep the load on Redis as minimal as possible. Do you have any performance benchmarks on how many resources / extra time the gem needs to process each job?

We have queues with over 3 Million scheduled items at a time and need to make sure everything is running smoothly. Also, how much additional storage is being used on average? We are mainly concerned about Redis performance.

Dead Locks

As far as we understand 'until_executed' requires for jobs to actually be performed in order for the lock to be lifted. What happens when queues are manually being cleared out / deleted. Will these jobs be stuck forever, or is there a way of synchronizing the enqueued jobs on a regular basis? Or even better remove the lock the moment a job is being deleted from the queue?

I guess the only other workaround would be to set an additional lock_timeout, but settings this to 7 days would result in quite some daily dataloss.

Please let us know your thoughts!

All the best,
Oskar

Answered by ArturT

Nov 12, 2021

Hi all

I can share a few info about sidekiq-unique-jobs gem that we use at https://knapsackpro.com/

We process a few million jobs per day with no problem.

From my experience sidekiq-unique-jobs impact on Redis performance is negligible. If you would worry about something it would be the number of keys (locks) stored in Redis and memory consumption.

sidekiq-unique-jobs gem already clears locks/Redis memory after itself so nothing to worry about. Once I had a situation that after the sidekiq-unique-jobs gem upgrade the Redis memory was not cleaned up and it was growing and growing but the bug was already fixed in sidekiq-unique-jobs gem.

Regarding Ruby vs Lua. We use config.reaper = :ruby. …

View full answer

mhenrixon · 2021-11-11T13:37:43Z

mhenrixon
Nov 11, 2021
Maintainer

I don't have any performance metrics yet but I have been meaning to gather some such. There are some hooks in place to that can be used to fiddle around with this but I haven't done anything with them yet.
It depends on how you clear your queues. If you use the sidekiq/api there are hooks in place that also clear the locks.

Due to that, the gem uses LUA script, I have no idea how it will perform in such a setup but I believe there be dragons. Knapsack PRO uses this gem under the hood but I don't know how hard their workers are utilized every day. Perhaps @ArturT can offer better guidance than me on the performance?

@OskarEichler, I'll see if I can get around to gathering some metrics.

0 replies

OskarEichler · 2021-11-11T23:33:56Z

OskarEichler
Nov 11, 2021
Author

Amazing thank you @mhenrixon.

We need to be cautious of deploying this on a high load system, as we want to prevent it completely blowing up our server. Doing so many extra calls to Redis for every scheduled job could definitely be causing some issues - would be great if you can share your metrics / experience here.

For push_bulk is there some sort of bulk_check method you are using to check lets say 1.000 jobs at once and clear them out with single operations, or are you looping through them one by one? Please let me know!

All the best,
Oskar

0 replies

ArturT · 2021-11-12T15:35:41Z

ArturT
Nov 12, 2021

Hi all

I can share a few info about sidekiq-unique-jobs gem that we use at https://knapsackpro.com/

We process a few million jobs per day with no problem.

From my experience sidekiq-unique-jobs impact on Redis performance is negligible. If you would worry about something it would be the number of keys (locks) stored in Redis and memory consumption.

sidekiq-unique-jobs gem already clears locks/Redis memory after itself so nothing to worry about. Once I had a situation that after the sidekiq-unique-jobs gem upgrade the Redis memory was not cleaned up and it was growing and growing but the bug was already fixed in sidekiq-unique-jobs gem.

Regarding Ruby vs Lua. We use config.reaper = :ruby. I do not recommend using :lua. It killed our Redis server instance immediately.

Here is my config

SidekiqUniqueJobs.configure do |config|
  config.debug_lua       = false # true for debugging
  config.lock_info       = false # true for debugging
  config.max_history     = 1000  # keeps n number of changelog entries
  # WARNING: never use :lua because this will lead to error and fail of our API
  # Redis::CommandError: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
  config.reaper          = :ruby # also :lua but that will lock while cleaning
  config.reaper_count    = 1000  # Reap maximum this many orphaned locks
  # Do not use too low reaper_interval value.
  # Reaper should have enough time to scan all Redis keys because it is slow for large data set.
  # https://github.com/mhenrixon/sidekiq-unique-jobs/issues/571#issuecomment-777053417
  config.reaper_interval = 600  # Reap every X seconds
  # reaper_timeout should be close to reaper_interval value to avoid leaking threads
  # https://github.com/mhenrixon/sidekiq-unique-jobs/issues/571#issuecomment-777003013
  config.reaper_timeout  = 595  # Give the reaper X seconds to finish
end

The biggest risk I see in using sidekiq-unique-jobs gem is to be very careful when upgrading this gem. We had many times situation that jobs were skipped because the new sidekiq-unique-jobs gem version introduced regression bugs.
Our test suite can't detect this easily so manual testing is needed to verify that nothing is wrong after each sidekiq-unique-jobs gem version upgrade.

Other than that sidekiq-unique-jobs gem is super helpful because we often use it to skip a large number of jobs to reduce sidekiq/ActiveRecord impact on DB and worker CPU resources consumption.

I hope this helps. :)

4 replies

lucasklaassen Jan 6, 2022

@ArturT can you speak to any of the regression bugs you encountered while upgrading and if these are documented anywhere in an upgrade guide that we can use? We are looking to move from v6 to v7 and are concerned the current documentation on upgrading isn't transparent enough about breaking changes or regression bugs we may run into.

ArturT Jan 7, 2022

I believe you shouldn't see problems upgrading from 6.x to 7.x (the latest version). I mostly had issues with minor upgrades within 7.x versions, like:
#576
#617

It was hard to catch issues until we saw it in production.

mhenrixon Jan 7, 2022
Maintainer

Those were also some of the most complicated problems to solve in the project so far!

Concurrency (Threading) and distributed locks are not easy problems to tackle.

I am very proud with the end result and apologize for any inconvenience caused by those couple of "fixes".

ttstarck May 14, 2023

Hello, I'm a little late to the party but am currently looking to upgrade from 6 to 7!

I have a requirement though that I need to be running version 6 and version 7 for both client and servers at the same time as I can't completely scale down all of my sidekiq server processes first before rolling out the code changes to version 7 (i.e. rolling upgrade of all services in kubernetes).

My main question is, if a job is locked on version 7, will a version 6 client also consider it locked?

If not, is there a way to disable the upgrade version 6 locks to version 7 lua script when deploying the upgrade to version 7?

mhenrixon · 2021-11-12T20:42:53Z

mhenrixon
Nov 12, 2021
Maintainer

Thanks @ArturT! I'm taking a note on the upgrade thing ❤️

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Q&A] Performance & Dead Locks #657

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Q&A] Performance & Dead Locks #657

OskarEichler Nov 11, 2021

Replies: 4 comments · 4 replies

mhenrixon Nov 11, 2021 Maintainer

OskarEichler Nov 11, 2021 Author

ArturT Nov 12, 2021

lucasklaassen Jan 6, 2022

ArturT Jan 7, 2022

mhenrixon Jan 7, 2022 Maintainer

ttstarck May 14, 2023

mhenrixon Nov 12, 2021 Maintainer

OskarEichler
Nov 11, 2021

Replies: 4 comments 4 replies

mhenrixon
Nov 11, 2021
Maintainer

OskarEichler
Nov 11, 2021
Author

ArturT
Nov 12, 2021

mhenrixon Jan 7, 2022
Maintainer

mhenrixon
Nov 12, 2021
Maintainer