Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock the rows when taking the advisory lock with ActiveRecordWithLock adapter #110

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

ankithads
Copy link
Contributor

@ankithads ankithads commented Aug 27, 2024

For ActiveRecordWithLock adapter, we want to lock the rows when acquiring the advisory locks on the table to avoid all the workers trying to lock the same job.

Added metrics to observe the latency to find the job and job hit rate.

@ankithads ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch from 8bfbac0 to 8628157 Compare August 29, 2024 08:26
@ankithads ankithads marked this pull request as ready for review August 29, 2024 08:28
@ankithads ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch from 8628157 to c93e650 Compare August 29, 2024 08:41
lib/que/sql.rb Outdated Show resolved Hide resolved
result = Que.execute(:find_job_to_lock, [queue, cursor])
result = Que.transaction do
observe(nil, FindJobSecondsTotal, queue: queue) do
result = Que.execute(:find_job_to_lock, [queue, cursor])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's obvious why we do this, given that there is already an advisory lock. Let's explain in a comment why this improves performance when there are a large number of workers contending for work, and why this doesn't ensure race safety (hence we still need the advisory lock).

lib/que/adapters/active_record_with_lock.rb Outdated Show resolved Hide resolved
lib/que/adapters/active_record_with_lock.rb Outdated Show resolved Hide resolved
spec/active_record_with_lock_spec_helper.rb Outdated Show resolved Hide resolved
else
config.filter_run_excluding :active_record_with_lock
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be unintuitive that only some specs run. Given that this is to test adapter spec functionality, could we just add an RSpec for that class?

observe(FindJobHitTotal, nil, { queue: queue, job_hit: job_locked })
return result if job_locked
end
break if result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: a combination of returns and breaks can be a bit hard to follow. Is it possible to avoid this break such that all result cases are handled by early returns?

@@ -4,7 +4,7 @@ class LockDatabaseRecord < ActiveRecord::Base
establish_connection(
adapter: "postgresql",
host: ENV.fetch("LOCK_PGHOST", "localhost"),
user: ENV.fetch("LOCK_PGUSER", "postgres"),
user: ENV.fetch("LOCK_PGUSER", "ubuntu"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this ubuntu ? Is this something specific to CI specs?

cursor = result.first["job_id"]
break if pg_try_advisory_lock?(cursor)
cursor = result.first["job_id"]
job_locked = pg_try_advisory_lock?(cursor)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to include the SQL query here in the metric for the time it takes to find a job ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit + non blocking for this PR: also, if you unify the metric labels you might be able to use Hesiod.register_duration_counters and Hesiod.observe as convenience wrappers

Copy link
Contributor Author

@ankithads ankithads Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time to find a job is in line 51. Locking shouldn't take much time. Not required to track as of now.
There is already a metric within Locker to track this

@@ -28,3 +28,11 @@ def active_record_with_lock_adapter_connection
lock_connection_pool: LockDatabaseRecord.connection_pool,
)
end

RSpec.configure do |config|
if ENV["ADAPTER"] == "ActiveRecordWithLock"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for gating this test? Also, if we're going to gate it do we need to document it/specify it in CI ?

Copy link
Contributor

@benk-gc benk-gc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on the control flow. I have a meta-point on the overall Que philosophy, since we're now breaking one of the assertions in the README:

Workers don't block each other when trying to lock jobs, as often occurs with "SELECT FOR UPDATE"-style locking.

If we're now breaking this assertion for these jobs we should make this clear and justify why we're trading this off for the performance hit.

lib/que/sql.rb Outdated
@@ -184,7 +184,7 @@ module Que
AND retryable = true
AND job_id >= $2
ORDER BY priority, run_at, job_id
LIMIT 1
for update skip locked LIMIT 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for update skip locked LIMIT 1
FOR UPDATE SKIP LOCKED LIMIT 1

Comment on lines 47 to 66
def lock_job_with_lock_database(queue, cursor)
result = []
loop do
result = Que.execute(:find_job_to_lock, [queue, cursor])
result = Que.transaction do
observe(nil, FindJobSecondsTotal, queue: queue) do
result = Que.execute(:find_job_to_lock, [queue, cursor])
end

break if result.empty?
return result if result.empty?

cursor = result.first["job_id"]
break if pg_try_advisory_lock?(cursor)
cursor = result.first["job_id"]
job_locked = pg_try_advisory_lock?(cursor)

observe(FindJobHitTotal, nil, { queue: queue, job_hit: job_locked })
return result if job_locked
end
break if result
end

result
end
Copy link
Contributor

@benk-gc benk-gc Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this control flow a bit confusing with the returns from within the transaction. Since you've got an interaction between the proc, the loop, and the method, it doesn't feel very rubyish since using returns from within a proc could be interpreted as intended to be a next. A possibly more Rubyish way writing this would be:

def lock_job_with_lock_database(queue, cursor)
  loop do
    locked_job = Que.transaction do
      job_to_lock = observe(nil, FindJobSecondsTotal, queue: queue) do
        Que.execute(:find_job_to_lock, [queue, cursor])
      end

      raise NoLockableJobs if job_to_lock.empty?

      cursor = result.first["job_id"]
      job_locked = pg_try_advisory_lock?(cursor)

      observe(FindJobHitTotal, nil, { queue: queue, job_hit: job_locked })

      job_locked
    end

    return locked_job if locked_job
  end
rescue NoLockableJobs
  []
end

It's then very clear that the case where there are no lockable jobs is an early exit, and the rest of the flow loops to find a lockable job in the queue while jobs exist.

@@ -119,6 +119,20 @@ def wait_for_jobs_to_be_worked(timeout: 10)
expect(User.count).to eq(3)
expect(User.all.map(&:name).sort).to eq(%w[alice bob charlie])
end

it "increments the metrics", :active_record_with_lock do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than mocking the metrics a nicer way to check this would be to run the workers then just check the metrics are in the expected state after the fact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

expect(QueJob.count).to eq(3)

with_workers(5) { wait_for_jobs_to_be_worked }
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we've introduced a more complex control flow it would be good to have some non-metric tests for the locking testing the early exit and loop functionality, and asserting we can't enter into an infinite loop when there are no jobs.

@ankithads ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch 6 times, most recently from 1b0e1d9 to 7d136d8 Compare September 3, 2024 09:38
@ankithads ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch 10 times, most recently from 5c49449 to 4398099 Compare September 4, 2024 08:36
end

let(:lock_connection_pool) do
return LockDatabaseRecord.connection_pool if ENV["ADAPTER"] == "ActiveRecordWithLock"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now have separate tests for the two adapters, I think we should use shared specs to define "Que behaviour" for all adapters, so that we only run those per adapter rather than all the specs - this feels a bit clearer than controlling this through env vars.

Non-blocking, can be a clean-up later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is required only for this Adapter I think adding a filter would be better. It gets bit tricky because of the 2 database here.
Rest all the specs are run against specific adapter.

Copy link
Contributor Author

@ankithads ankithads Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this to exclude all the files in adapter spec directory except for the particular adapter spec

@ankithads ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch 2 times, most recently from 370bc7a to d9f4322 Compare September 4, 2024 11:09
@ankithads ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch from d9f4322 to ee42d61 Compare September 4, 2024 14:46
@ankithads ankithads changed the title Lock the rows when taking the advisory lock Lock the rows when taking the advisory lock with ActiveRecordWithLock adapter Sep 4, 2024
@ankithads ankithads merged commit 5b3d374 into master Sep 4, 2024
14 checks passed
@ankithads ankithads deleted the Activerecordwithlock/lock-job-rows-on-selection branch September 4, 2024 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants