Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

until_and_while_executing with separate server and client lock_args broken since 8.0.8 #846

Open
Roguelazer opened this issue Apr 15, 2024 · 7 comments

Comments

@Roguelazer
Copy link

Roguelazer commented Apr 15, 2024

Describe the bug

After upgrading from 8.0.7 to 8.0.10, jobs with until_and_while_executing that pass a custom lock_args_method which behaves differently on the server and the client (as shown in the README) no longer work. Based on bisection, this appears to have been broken by 63e9431 (the parent commit, 7d5e40a, still works).

Expected behavior
Given a worker as shown below, run the following client code:

1.upto(2)
  UniqueBugWorker.perform_async(id)
  UniqueBugWorker.perform_async(id)
  UniqueBugWorker.perform_async(id)
  UniqueBugWorker.perform_async(id, "foo")
  UniqueBugWorker.perform_async(id, "foo", "bar")
end

We expect to see exactly six invocations.

Current behavior
Since 63e9431, we end up seeing at most two invocations (and sometimes only 1, which is even more worrying). It appears that the lock digest is being reused between the until and while executing phases.

For some reason, the conflicted jobs also aren't getting rescheduled; they just get dropped on the floor.

worse yet, we end up creating and leaking uniquejobs: locks with the correct digest, so after downgrading to 8.0.7, the jobs are locked out until the lock TTL expires.

Worker class

class UniqueBugWorker
  include Sidekiq::Worker
  sidekiq_options queue: "repro",
    lock: :until_and_while_executing,
    on_conflict: {server: :reschedule, client: :log},
    lock_args_method: ->(args) do
      if Sidekiq.server?
        [args.first]
      else
        args
      end
    end

  def perform(id, foo = nil, bar = nil)
    Rails.logger.info { "UniqueBugWorker: performing #{id} with #{foo} #{bar}" }
    sleep 1
  end
end

Additional context
Add any other context about the problem here.

@reneklacan
Copy link

We seem to have been running into same issues since upgrade but with until_executed (possibly with other locks as well but I only have confirmation about issues with until_executed)

@mhenrixon
Copy link
Owner

@reneklacan do you also use custom lock args?

@reneklacan
Copy link

@mhenrixon no

@gbs4ever
Copy link

gbs4ever commented Aug 12, 2024

we are getting a similar issue sidekiq_options lock: :until_executed, on_conflict: :reschedule error => SystemStackError: stack level too deep from <internal:kernel>:185:in `loop

@zacheryph
Copy link

As mentioned above; This appears to be an issue related to until_executing and reschedule. This also affects until_and_while_executing as it contains the until. I think this is actually easier to produce with until-and-while. You can produce the error with the following;

class BrokenJob
  include Sidekiq::Job
  sidekiq_options lock: :until_and_while_executing, on_conflict: :reschedule
  def perform = sleep(10)
end

BrokenJob.perform_async
#=> "job_id"
BrokenJob.perform_async
# SystemStackError: stack level too deep

For :until_executing its easier to reproduce if you just don't run the sidekiq worker process at all (or while the queue is backed up.)

From what I can tell; If the job is enqueued it fails to get the lock (until). While rescheduling, it checks this lock again and... since its locked, tries to reschedule... What I think should happen here is if the job is being rescheduled (or.. scheduled in the future in general?), it should ignore checking the lock?

As an asside; Interestingly, sometimes I can get this to actually run two jobs at the same time, and some times the second invocation returns nil and the 3rd will produce a Stack error. So the locking in general is a bit racey to begin with?

@gbs4ever
Copy link

gbs4ever commented Aug 29, 2024

That is not great if the lock is racey as the whole point of the lock is to prevent race condition

@zacheryph
Copy link

zacheryph commented Aug 29, 2024

To extend...

I tried updating this to remove the lock while OnConflict::Reschedule is doing perform_in but... it really doesn't matter. The gem prioritizes the classes options, over the jobs options. So even if you .set(lock: nil) it purposefully ignore the lock setting, it still uses the lock setting from the class.

brucebolt added a commit to alphagov/asset-manager that referenced this issue Sep 23, 2024
A [bug in version
8.0.8](mhenrixon/sidekiq-unique-jobs#846)
means jobs never get processed when using `until_and_while_executing`
lock type.

Therefore pinning to use an earlier version that does not have this
issue.
brucebolt added a commit to alphagov/asset-manager that referenced this issue Sep 23, 2024
A [bug in version
8.0.8](mhenrixon/sidekiq-unique-jobs#846)
means jobs never get processed when using `until_and_while_executing`
lock type.

Therefore pinning to use an earlier version that does not have this
issue.
brucebolt added a commit to alphagov/asset-manager that referenced this issue Sep 23, 2024
A [bug in version
8.0.8](mhenrixon/sidekiq-unique-jobs#846)
means jobs never get processed when using `until_and_while_executing`
lock type.

Therefore pinning to use an earlier version that does not have this
issue.
brucebolt added a commit to alphagov/asset-manager that referenced this issue Sep 23, 2024
A [bug in version
8.0.8](mhenrixon/sidekiq-unique-jobs#846)
means jobs never get processed when using `until_and_while_executing`
lock type.

Therefore pinning to use an earlier version that does not have this
issue.
brucebolt added a commit to alphagov/asset-manager that referenced this issue Sep 24, 2024
A [bug in version
8.0.8](mhenrixon/sidekiq-unique-jobs#846)
means jobs never get processed when using `until_and_while_executing`
lock type.

Therefore pinning to use an earlier version that does not have this
issue.
brucebolt added a commit to alphagov/asset-manager that referenced this issue Sep 24, 2024
A [bug in version
8.0.8](mhenrixon/sidekiq-unique-jobs#846)
means jobs never get processed when using `until_and_while_executing`
lock type.

Therefore pinning to use an earlier version that does not have this
issue.
brucebolt added a commit to alphagov/publishing-api that referenced this issue Sep 25, 2024
A [bug in version
8.0.8](mhenrixon/sidekiq-unique-jobs#846)
means jobs never get processed when using `until_and_while_executing`
lock type.

Therefore pinning to use an earlier version that does not have this
issue.
brucebolt added a commit to alphagov/publishing-api that referenced this issue Sep 25, 2024
A [bug in version
8.0.8](mhenrixon/sidekiq-unique-jobs#846)
means jobs never get processed when using `until_and_while_executing`
lock type.

Therefore pinning to use an earlier version that does not have this
issue.
JonathanHallam added a commit to alphagov/link-checker-api that referenced this issue Sep 26, 2024
A [bug in version
8.0.8](mhenrixon/sidekiq-unique-jobs#846)
means jobs never get processed when using `until_and_while_executing`
lock type.

Therefore pinning to use an earlier version that does not have this
issue.
JonathanHallam added a commit to alphagov/link-checker-api that referenced this issue Sep 26, 2024
A [bug in version
8.0.8](mhenrixon/sidekiq-unique-jobs#846)
means jobs never get processed when using `until_and_while_executing`
lock type.

Therefore pinning to use an earlier version that does not have this
issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants