[WJ-1176] Job queue #1671

emmiegit · 2023-10-30T05:26:11Z

Ground work for a persistent job queue was set up in #1668.

In this PR, I use the added rsmq_async crate to persist jobs to the queue, and receive jobs from the queue in a job worker (configurable quantity), before deleting them post-success, to enable failure retries. The previous job queue implementation, an in-memory channel, was removed. This simplifies ServerState setup a bit.

Additionally, to implement recurring jobs such as the periodic pruning work, this system permits jobs to submit a "follow-up job", which is added to the queue as part of the job's work. Combined with a delay, this allows things such as "run this operation every six hours".

I've added configuration fields for all these various durations and values.

I also change the concept of "directly fetching pages" to not require a site ID, and have an option for more clear fetching of deleted pages or not. This also updates the OutdateService, which had a bug in it regarding use of an incorrect site_id on pages.

NOTE: There is a known issue where the value for "one day" (86400 seconds) gets multiplied by 1000 and overflows rsmq's limit. I will investigate further and file a bug in the upstream, but for now this PR is ready.

codecov · 2023-10-30T05:36:40Z

Codecov Report

Merging #1671 (d6a2f08) into develop (c487a33) will decrease coverage by 0.20%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           develop    #1671      +/-   ##
===========================================
- Coverage    40.45%   40.25%   -0.20%     
===========================================
  Files          341      342       +1     
  Lines        10744    10798      +54     
===========================================
+ Hits          4346     4347       +1     
- Misses        6398     6451      +53

Flag	Coverage Δ		*Carryforward flag
deepwell	`2.10% <0.00%> (-0.01%)`	⬇️
ftml	`76.83% <ø> (ø)`		Carriedforward from 6d801ee

*This pull request uses carry forward flags. Click here to find out more.

Files	Coverage Δ
deepwell/src/config/object.rs	`0.00% <ø> (ø)`
deepwell/src/services/context.rs	`0.00% <ø> (ø)`
deepwell/src/services/page/structs.rs	`0.00% <ø> (ø)`
deepwell/src/api.rs	`0.00% <0.00%> (ø)`
deepwell/src/services/error.rs	`0.00% <0.00%> (ø)`
deepwell/src/services/page_revision/service.rs	`0.00% <0.00%> (ø)`
deepwell/src/endpoints/page.rs	`0.00% <0.00%> (ø)`
deepwell/src/services/page/service.rs	`0.00% <0.00%> (ø)`
deepwell/src/services/file/service.rs	`0.00% <0.00%> (ø)`
deepwell/src/services/job/service.rs	`0.00% <0.00%> (ø)`
... and 4 more

... and 1 file with indirect coverage changes

Is necessary because a worker may reboot, or multiple workers may start and then all but the first fail to create the queue.

Now using the actual queue!

In some circumstances we really do not have the site ID and should not be requiring it to be filtered there (that's what the regular get() methods are for), so instead we should pass in only the ID to be fetched, with a check for deleted entities in case we only want extant ones.

This has an option for getting deleted pages, and separates out the notion of fetching live pages by ID for internal processes.

With the change to PageService::get_direct(), we do not need the site_id to do outdating, and this isn't correct anyways since for some cases it assumes any page connections are on the same site, which is wrong.

This way we don't get into loops where a job fails, but not before it adds another job on the queue, leading to a build-up.

rsmq enforces a maximum which we're apparently surpassing, we should catch this at Config parsing/creation time.

emmiegit · 2023-10-31T22:19:26Z

thanks @Zokhoi @Yossipossi1

emmiegit self-assigned this Oct 30, 2023

emmiegit force-pushed the WJ-1176-job-queue branch from a295b9f to b46d52a Compare October 30, 2023 15:18

emmiegit added 22 commits October 30, 2023 11:28

Add serde to Job.

0f01f9f

Ignore warning suppression.

60252fd

Set up queue in Redis connection.

08d6385

Only create job queue if it doesn't exist.

e3f31c7

Is necessary because a worker may reboot, or multiple workers may start and then all but the first fail to create the queue.

Add rsmq_async::RsmqError to ServiceError.

37faa4a

Add queue_job() method to JobService.

a08e712

Now using the actual queue!

Edit comment.

2db1434

Change prune-session-secs example.

9db865f

Update refill-name-change-days config comment.

0d51339

Add delay exponential waiting for empty job queue.

952c6b6

Add worker count to configuration file.

0a6e182

Add job username rename token refill duration to configuration.

a87de3a

Add new config keys.

5ff24b3

Rename Job::RerenderPageId -> Job::RerenderPage.

8e6ea13

Add new configuration fields to the config structures.

52295b3

Replace JobQueue channels with rsmq_async.

ba61e70

Return sleep delays for each loop iteration.

d7d9b88

Change any -> direct for page ID access.

aa31e93

This has an option for getting deleted pages, and separates out the notion of fetching live pages by ID for internal processes.

Don't require site_id for outdating.

9bcf1e1

With the change to PageService::get_direct(), we do not need the site_id to do outdating, and this isn't correct anyways since for some cases it assumes any page connections are on the same site, which is wrong.

Process follow-up job response.

299550c

Address clippy lint.

73c42e8

emmiegit force-pushed the WJ-1176-job-queue branch from b46d52a to 73c42e8 Compare October 30, 2023 15:28

emmiegit added 4 commits October 30, 2023 20:25

Queue follow-up job after message removal.

e561786

This way we don't get into loops where a job fails, but not before it adds another job on the queue, leading to a build-up.

Add job.max-attempts to configuration.

7491c38

Check max retries and delete on receipt there.

419e1ac

Log worker ID in job raw data.

217da8f

emmiegit added 3 commits October 31, 2023 01:52

Add configuration assertions for out-of-range values.

e4c987e

rsmq enforces a maximum which we're apparently surpassing, we should catch this at Config parsing/creation time.

Don't check numbers unaffected by the limit.

f241c40

Run cargo fmt.

6d801ee

emmiegit marked this pull request as ready for review October 31, 2023 06:06

emmiegit requested review from Zokhoi and Yossipossi1 October 31, 2023 06:06

emmiegit added 4 commits October 31, 2023 02:11

Add Job::LiftExpiredPunishments.

1cb3c7c

Add lift_expired_punishments job config field.

88cbd14

Add lift-expired-punishments key to example and installed configs.

80c22a6

Use new configuration field in worker implementation.

d6a2f08

Yossipossi1 approved these changes Oct 31, 2023

View reviewed changes

Zokhoi approved these changes Oct 31, 2023

View reviewed changes

emmiegit merged commit e8b7547 into develop Oct 31, 2023
9 checks passed

emmiegit deleted the WJ-1176-job-queue branch October 31, 2023 22:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WJ-1176] Job queue #1671

[WJ-1176] Job queue #1671

emmiegit commented Oct 30, 2023 •

edited

Loading

codecov bot commented Oct 30, 2023 •

edited

Loading

emmiegit commented Oct 31, 2023

[WJ-1176] Job queue #1671

[WJ-1176] Job queue #1671

Conversation

emmiegit commented Oct 30, 2023 • edited Loading

codecov bot commented Oct 30, 2023 • edited Loading

Codecov Report

emmiegit commented Oct 31, 2023

emmiegit commented Oct 30, 2023 •

edited

Loading

codecov bot commented Oct 30, 2023 •

edited

Loading