Replies: 1 comment 6 replies
-
@elee1766 This one's going to be tough to fix if you keep going down that path I'm afraid. You are fundamentally executing many, expensive, conflicting operations, and unsurprisingly, that's going to be expensive and without perfectly optimal performance. I suspect as well that it's more the "conflicting" part that you're having the most trouble with here. We actually did a blog post recently on how River's uniqueness is implemented [1] and it's not all that complex:
None of those pieces take very long, but if you have lots of conflicting unique inserts, then they're all waiting on each other's lock. The only possible remediation I can think of is if we provided an "optimistic unique check", wherein if it noticed that someone else already had the advisory lock, it'd fall through immediately instead of waiting to acquire it. The downside of this though is that if the other lock taker didn't end up inserting successfully, you could lose a whole row. In your case, an alternative: drop the uniqueness checks and then implement your job such that it checks on start up the last time its data was updated. If the update was very recent, it falls through with a no op. So you'd still be inserting lots of jobs, but most of them wouldn't be doing any work, and you wouldn't suffer the unique performance penalty. [1] https://riverqueue.com/blog/uniqueness-with-advisory-locks |
Beta Was this translation helpful? Give feedback.
-
We noticed incredibly slow performance (sub 100 rps on a 4vcpu/6gb postgres instance. can get exact number if needed) when attempting to insert 1000s of jobs with unique job constraints at the same time, bottlenecked by what i believe is the advisory lock, but maybe i am wrong. What we do know is that when inserting non-unique jobs, either with Insert or InsertMany, we don't have throughput issues.
the problem: we have a task that updates entities in the background when users request such data and the data is stale enough. We would like this this job to be unique by the entity id, but it is often that thousands of these are being inserted at a time. Along with this, a job runs which schedules updates for all ~50,000 entities every single entity every 15 minutes.
with unique insert on, insertion is too slow and we dont schedule all the tasks as fast as we wish. but with unique off, we run into the problem where if network conditions cause us unable to complete jobs in time, we end up scheduling an infinite amount of jobs. however, we don't want to stop scheduling jobs as there may be new entity ids that we need to add jobs for even if all entity updates in the previous batch are not yet complete.
for now we are chunking the batch job into large groups with timeout but that loses us both observability, reduces our parallelism, and heavily complicates the retry logic, since you need to reschedule a new job with your errors. (we simply are not retrying for now)
so far our current ideas are
but neither of these things felt like very good ideas
Beta Was this translation helpful? Give feedback.
All reactions