Lock the rows when taking the advisory lock with ActiveRecordWithLock adapter #110

ankithads · 2024-08-27T13:09:50Z

For ActiveRecordWithLock adapter, we want to lock the rows when acquiring the advisory locks on the table to avoid all the workers trying to lock the same job.

Added metrics to observe the latency to find the job and job hit rate.

lib/que/sql.rb

ameykusurkar · 2024-08-29T09:18:37Z

lib/que/adapters/active_record_with_lock.rb

-          result = Que.execute(:find_job_to_lock, [queue, cursor])
+          result = Que.transaction do
+            observe(nil, FindJobSecondsTotal, queue: queue) do
+              result = Que.execute(:find_job_to_lock, [queue, cursor])


I don't think it's obvious why we do this, given that there is already an advisory lock. Let's explain in a comment why this improves performance when there are a large number of workers contending for work, and why this doesn't ensure race safety (hence we still need the advisory lock).

lib/que/adapters/active_record_with_lock.rb

spec/active_record_with_lock_spec_helper.rb

ameykusurkar · 2024-08-29T09:24:53Z

spec/active_record_with_lock_spec_helper.rb

+  else
+    config.filter_run_excluding :active_record_with_lock
+  end
+end


I think it can be unintuitive that only some specs run. Given that this is to test adapter spec functionality, could we just add an RSpec for that class?

ameykusurkar · 2024-08-29T09:30:17Z

lib/que/adapters/active_record_with_lock.rb

+            observe(FindJobHitTotal, nil, { queue: queue, job_hit: job_locked })
+            return result if job_locked
+          end
+          break if result


Nit: a combination of returns and breaks can be a bit hard to follow. Is it possible to avoid this break such that all result cases are handled by early returns?

mfmmq · 2024-08-29T09:21:06Z

spec/active_record_with_lock_spec_helper.rb

@@ -4,7 +4,7 @@ class LockDatabaseRecord < ActiveRecord::Base
  establish_connection(
    adapter: "postgresql",
    host: ENV.fetch("LOCK_PGHOST", "localhost"),
-    user: ENV.fetch("LOCK_PGUSER", "postgres"),
+    user: ENV.fetch("LOCK_PGUSER", "ubuntu"),


Why is this ubuntu ? Is this something specific to CI specs?

mfmmq · 2024-08-29T09:27:15Z

lib/que/adapters/active_record_with_lock.rb

-          cursor = result.first["job_id"]
-          break if pg_try_advisory_lock?(cursor)
+            cursor = result.first["job_id"]
+            job_locked = pg_try_advisory_lock?(cursor)


Do we want to include the SQL query here in the metric for the time it takes to find a job ?

Nit + non blocking for this PR: also, if you unify the metric labels you might be able to use Hesiod.register_duration_counters and Hesiod.observe as convenience wrappers

time to find a job is in line 51. Locking shouldn't take much time. Not required to track as of now.
There is already a metric within Locker to track this

mfmmq · 2024-08-29T09:36:47Z

spec/active_record_with_lock_spec_helper.rb

@@ -28,3 +28,11 @@ def active_record_with_lock_adapter_connection
    lock_connection_pool: LockDatabaseRecord.connection_pool,
  )
 end
+
+RSpec.configure do |config|
+  if ENV["ADAPTER"] == "ActiveRecordWithLock"


What's the reason for gating this test? Also, if we're going to gate it do we need to document it/specify it in CI ?

benk-gc

A few comments on the control flow. I have a meta-point on the overall Que philosophy, since we're now breaking one of the assertions in the README:

Workers don't block each other when trying to lock jobs, as often occurs with "SELECT FOR UPDATE"-style locking.

If we're now breaking this assertion for these jobs we should make this clear and justify why we're trading this off for the performance hit.

benk-gc · 2024-08-29T09:53:10Z

lib/que/sql.rb

@@ -184,7 +184,7 @@ module Que
            AND retryable = true
            AND job_id >= $2
            ORDER BY priority, run_at, job_id
-            LIMIT 1
+            for update skip locked LIMIT 1


Suggested change

for update skip locked LIMIT 1

FOR UPDATE SKIP LOCKED LIMIT 1

benk-gc · 2024-08-29T09:57:08Z

lib/que/adapters/active_record_with_lock.rb

      def lock_job_with_lock_database(queue, cursor)
        result = []
        loop do
-          result = Que.execute(:find_job_to_lock, [queue, cursor])
+          result = Que.transaction do
+            observe(nil, FindJobSecondsTotal, queue: queue) do
+              result = Que.execute(:find_job_to_lock, [queue, cursor])
+            end

-          break if result.empty?
+            return result if result.empty?

-          cursor = result.first["job_id"]
-          break if pg_try_advisory_lock?(cursor)
+            cursor = result.first["job_id"]
+            job_locked = pg_try_advisory_lock?(cursor)
+
+            observe(FindJobHitTotal, nil, { queue: queue, job_hit: job_locked })
+            return result if job_locked
+          end
+          break if result
        end
+
        result
      end


I found this control flow a bit confusing with the returns from within the transaction. Since you've got an interaction between the proc, the loop, and the method, it doesn't feel very rubyish since using returns from within a proc could be interpreted as intended to be a next. A possibly more Rubyish way writing this would be:

def lock_job_with_lock_database(queue, cursor) loop do locked_job = Que.transaction do job_to_lock = observe(nil, FindJobSecondsTotal, queue: queue) do Que.execute(:find_job_to_lock, [queue, cursor]) end raise NoLockableJobs if job_to_lock.empty? cursor = result.first["job_id"] job_locked = pg_try_advisory_lock?(cursor) observe(FindJobHitTotal, nil, { queue: queue, job_hit: job_locked }) job_locked end return locked_job if locked_job end rescue NoLockableJobs [] end

It's then very clear that the case where there are no lockable jobs is an early exit, and the rest of the flow loops to find a lockable job in the queue while jobs exist.

benk-gc · 2024-08-29T09:58:30Z

spec/integration/integration_spec.rb

@@ -119,6 +119,20 @@ def wait_for_jobs_to_be_worked(timeout: 10)
      expect(User.count).to eq(3)
      expect(User.all.map(&:name).sort).to eq(%w[alice bob charlie])
    end
+
+    it "increments the metrics", :active_record_with_lock do


Rather than mocking the metrics a nicer way to check this would be to run the workers then just check the metrics are in the expected state after the fact.

benk-gc · 2024-08-29T09:59:34Z

spec/integration/integration_spec.rb

+      expect(QueJob.count).to eq(3)
+
+      with_workers(5) { wait_for_jobs_to_be_worked }
+    end
  end


Given we've introduced a more complex control flow it would be good to have some non-metric tests for the locking testing the early exit and loop functionality, and asserting we can't enter into an infinite loop when there are no jobs.

lib/que/adapters/active_record_with_lock.rb

ameykusurkar · 2024-09-04T08:50:47Z

spec/lib/que/adapters/active_record_with_lock_spec.rb

+  end
+
+  let(:lock_connection_pool) do
+    return LockDatabaseRecord.connection_pool if ENV["ADAPTER"] == "ActiveRecordWithLock"


Now have separate tests for the two adapters, I think we should use shared specs to define "Que behaviour" for all adapters, so that we only run those per adapter rather than all the specs - this feels a bit clearer than controlling this through env vars.

Non-blocking, can be a clean-up later

Since this is required only for this Adapter I think adding a filter would be better. It gets bit tricky because of the 2 database here.
Rest all the specs are run against specific adapter.

I have updated this to exclude all the files in adapter spec directory except for the particular adapter spec

ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch from 8bfbac0 to 8628157 Compare August 29, 2024 08:26

ankithads marked this pull request as ready for review August 29, 2024 08:28

ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch from 8628157 to c93e650 Compare August 29, 2024 08:41

ameykusurkar reviewed Aug 29, 2024

View reviewed changes

mfmmq reviewed Aug 29, 2024

View reviewed changes

benk-gc reviewed Aug 29, 2024

View reviewed changes

ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch 6 times, most recently from 1b0e1d9 to 7d136d8 Compare September 3, 2024 09:38

ameykusurkar reviewed Sep 3, 2024

View reviewed changes

lib/que/adapters/active_record_with_lock.rb Outdated Show resolved Hide resolved

lib/que/adapters/active_record_with_lock.rb Outdated Show resolved Hide resolved

ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch 10 times, most recently from 5c49449 to 4398099 Compare September 4, 2024 08:36

ameykusurkar approved these changes Sep 4, 2024

View reviewed changes

ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch 2 times, most recently from 370bc7a to d9f4322 Compare September 4, 2024 11:09

Lock the rows when taking the advisory lock

ee42d61

ankithads force-pushed the Activerecordwithlock/lock-job-rows-on-selection branch from d9f4322 to ee42d61 Compare September 4, 2024 14:46

ankithads changed the title ~~Lock the rows when taking the advisory lock~~ Lock the rows when taking the advisory lock with ActiveRecordWithLock adapter Sep 4, 2024

ankithads merged commit 5b3d374 into master Sep 4, 2024
14 checks passed

ankithads deleted the Activerecordwithlock/lock-job-rows-on-selection branch September 4, 2024 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock the rows when taking the advisory lock with ActiveRecordWithLock adapter #110

Lock the rows when taking the advisory lock with ActiveRecordWithLock adapter #110

ankithads commented Aug 27, 2024 •

edited

Loading

ameykusurkar Aug 29, 2024

ameykusurkar Aug 29, 2024

ameykusurkar Aug 29, 2024

mfmmq Aug 29, 2024

mfmmq Aug 29, 2024

mfmmq Aug 29, 2024

ankithads Sep 2, 2024 •

edited

Loading

mfmmq Aug 29, 2024

benk-gc left a comment

benk-gc Aug 29, 2024

benk-gc Aug 29, 2024 •

edited

Loading

benk-gc Aug 29, 2024

ankithads Sep 3, 2024

benk-gc Aug 29, 2024

ameykusurkar Sep 4, 2024

ankithads Sep 4, 2024

ankithads Sep 4, 2024 •

edited

Loading

	for update skip locked LIMIT 1
	FOR UPDATE SKIP LOCKED LIMIT 1

Lock the rows when taking the advisory lock with ActiveRecordWithLock adapter #110

Lock the rows when taking the advisory lock with ActiveRecordWithLock adapter #110

Conversation

ankithads commented Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankithads Sep 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benk-gc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benk-gc Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankithads Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

ankithads commented Aug 27, 2024 •

edited

Loading

ankithads Sep 2, 2024 •

edited

Loading

benk-gc Aug 29, 2024 •

edited

Loading

ankithads Sep 4, 2024 •

edited

Loading