Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for hash scheduling #1260

Merged
merged 7 commits into from
Sep 16, 2024

Conversation

fopina
Copy link
Contributor

@fopina fopina commented Jan 25, 2023

This PR adds support for Jenkins-style hash schedule spec such as

0 0 H * * *

H is replaced by something derived from job name and moduled to its position within the spec (such as H mod 60 for minutes but H mod 24 for hour).

This allows for easily scheduling many daily jobs with same cron spec while still mildly balancing them throughout the day.

I tried to do it directly in the dkron Scheduler but it gets tricky due to the need of a second parameter that is not part of cron.Scheduler interface.

Also, the hash itself is actually just summing up the character values for simplification. It could be improved with an actual hashing method with less collisions and better distribution but in the end it would still use modullus 7, 12, 24, 60 causing many collisions.
As this is not for security but only mildly load balancing, I left as is.

Let me have feedback!

@yvanoers
Copy link
Collaborator

I'll try and have a closer look this weekend, but at the very least I'd like to see some accompanying unit tests.
Also, please have a look at the discussion in #866 in case you haven't yet.

@fopina
Copy link
Contributor Author

fopina commented Jan 26, 2023

Hi @yvanoers, I had not seen that PR no, but (as mentioned in the original description) Jenkins was my motivation indeed.

This is actually a PR I've had in my ToDo list since I started using dkron 2 years ago, as I used H all the time in Jenkins and missed it.

Was there any highlights/caution you wanted me to notice there? My understanding, now that I read it, is that everyone agreed with the approach.

Definitely missing unit tests, I'll take care of those

Note: I'm already using this in my fork, so far so good!

@yvanoers
Copy link
Collaborator

Hi @fopina !
The noteworthy bit of the feature request is the intent to have this functionality added to the cron library - that's where it should belong. There ought to come a PR having the same behavior as this PR for that project, so we can seemlessly move to a new version of cron when (and if) it'll get merged.
I can't make you do that, of course, but it would be much appreciated.

@fopina
Copy link
Contributor Author

fopina commented Jan 28, 2023

ah got it @yvanoers !

I think it would make sense to make it available everywhere though… cron library cares about schedule nothing else.

There’s not enough entropy in the schedule string to use it as the hash meaning a new non-sense parameter would be added to the library method just so it would do the “strings.replace” anyone can easily do before calling it.

So even though I think it would make sense, I don’t think it does in the end 😄

but happy to agree to disagree and closing this PR!

@yvanoers
Copy link
Collaborator

... and closing this PR!

There are no high expectations of this getting included in the cron scheduler.
And we do want to support it in Dkron.
So no need to close this PR prematurely.

Will finish code review hereafter.

Copy link
Collaborator

@yvanoers yvanoers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that doing this implementation correctly while preventing any undesired/unexpected behavior may need to be done by parsing the schedule in greater detail.
Which is why the cron scheduler itself is arguably the best place for this functionality. Otherwise it may be necessary to parse the schedule twice, which feels wasteful to me.

I'm happy to be proven wrong though :)

Also: We're not against including a modified version of the entire cron scheduler in Dkron. It's been so before. Ideally we wouldn't have to, though, but that is probably not feasible in the short term.

dkron/job.go Outdated
// such as "0 0 H * * *"
func (j *Job) scheduleHash() string {
spec := j.Schedule
if strings.Contains(spec, "H") && strings.Count(strings.TrimSpace(spec), " ") == 5 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cron format allows for a timezone setter as a prefix, e.g. 'TZ=Europe/Madrid 0 0 H * * *'.
This case isn't accounted for.
Keep in mind that there are timezones that have an 'H' in them (especially relevant in line 334).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored a bit to handle TZ CRON_TZ and @..., thanks for the heads up!

dkron/job.go Outdated
func (j *Job) scheduleHash() string {
spec := j.Schedule
if strings.Contains(spec, "H") && strings.Count(strings.TrimSpace(spec), " ") == 5 {
h := 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you write out the actual variable name, hash I'm guessing it is supposed to be represent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, hope those look clearer now

dkron/job.go Outdated
}
parts := strings.Split(spec, " ")
for index, part := range parts {
if strings.Contains(part, "H") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the side effect of allowing an expression such as */H be accepted and processed.
Is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intended yes, not because of */H but for H/5 (every 5min starting at H).
Not sure if */H would be of any use, but I don't think it's worth actively excluding it as */55 (every 55minutes) is still valid cron expression 🤷
What do you think?

dkron/job.go Outdated
ph := h
switch index {
case 2:
ph %= 24
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all days are 24 hours long. Not sure how the Jenkins scheduler handled that, but it is worth thinking about and documenting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understood? I'm probably missing when are days not 24h long.
Also, that doesn't seem to be an "hash" issue since most people setting up jobs to run at specific hour every day, will choose an hour out of 24...
The reference for these mods are the official dkron documentation for cron expressions, hour part is 0-23

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the day not being 24h long is a bit besides the point - sorry.

My point was daylight savings time:
When daylight savings time kicks in, for timezones that have that, 2 AM is skipped over: 1:59AM -> 3:00AM. If something is scheduled to jitter around 3AM, it may decide to pick 2AM on a day that does not have 2AM.
The other way around, when coming back from daylight savings time, you have 2AM twice: 2:59AM -> 2:00AM.

This is why it is generally a good idea to run scheduling servers on UTC time. Running a scheduler with a timezone that can jump back and forth in time is risky. Hence, worth thinking about and documenting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice point on DST! Good stuff to keep in mind for many things!
Personally I have every server in UTC; not just cronjobs, because I never know which apps write logs with UTC or localtime and then depending on the visualization tool, again the same issue, so I'm totally onboard with the UTC recommendation.

That being said, I don't think I agree that it should be worth thinking and documenting "within this PR" but rather at the scheduler level (maybe not even dkron, but the cron lib).
This PR only adds a special character that will resolve to number. And it will resolve for valid values of the field where it's used.
If there are values that might result in bad behavior, it's still the same as if the user would choose that specific value himself (instead of using H/~).
I agree that we should avoid cases that make sense (like you mentioned for months, hope you agree with using 28 as limit), but for dealing with DST and the issues you mention, "once a day" schedule will be subject to them, regardless of using 0 0 H * * * or 0 0 3 * * *.
So I think it's a valid caution/recommendation to note somewhere, but not in the scheduleHash specifically, as the risk is not introduced by using it.
What do you think?

dkron/job.go Outdated
case 2:
ph %= 24
case 3:
ph = (ph % 31) + 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all months are 31 days long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While for hours I didn't understand your point, I do in this one 😄
Updated max to 28 🙇
Curious cron library would handle choosing day 31 for a month of 30, as anyone can specify that in a spec, will it skip a month?

@fopina
Copy link
Contributor Author

fopina commented Jan 30, 2023

Thank for taking the time for explaining @yvanoers , I do understand your point much better now.

Still I'm not sure how to introduce the hashable parameter without making it look nonsense. But I'll definitely take a look first then!

@fopina
Copy link
Contributor Author

fopina commented Jan 9, 2024

I was looking into updating my fork 3.2.3 to latest 3.2.7 and couldn't remember why I had a fork, only to go through history and find this feature was the reason 🤦

I didn't look into opening PRs to cron scheduler but re-reading this thread I believe it's agreed it would be unlikely to happen anyway.

Should I try to re-implement this with something other than H as the hash token? Maybe something more unique that we wouldn't have to parse schedule in greater detail?

Or simply going with a regexp \bH\b?

@yvanoers
Copy link
Collaborator

Maybe something more unique that we wouldn't have to parse schedule in greater detail?

That may be worth exploring.
'H' is an unofficial addition anyway, there's no absolute need to adhere to it.
'~' perhaps?

@cobolbaby
Copy link
Contributor

cobolbaby commented Jan 16, 2024

To support hash scheduling strategies, should we consider implementing a new nodeSelector? Then introduce a new configuration item to switch the scheduling strategy? The default nodeSelector strategy is random dispatch.

targetNodes = a.getTargetNodes(job.Tags, defaultSelector)

dkron/dkron/agent.go

Lines 757 to 760 in f114610

// The default selector function for getTargetNodes/selectNodes
func defaultSelector(nodes []Node) int {
return rand.Intn(len(nodes))
}

@fopina fopina changed the base branch from main to 4.x February 11, 2024 00:58
@fopina fopina force-pushed the feature/hashscheduling branch from dd2fb04 to af8a649 Compare February 11, 2024 00:59
@fopina fopina force-pushed the feature/hashscheduling branch from 2f6704b to d3b24ea Compare February 11, 2024 17:37
@fopina
Copy link
Contributor Author

fopina commented Feb 11, 2024

'~' perhaps?

@yvanoers thank you for all the feedback. I've also updated H to ~ while going through the rest of the review comments.

Hope it looks better now!

Sadly, I couldn't reply to review comments within a single review (sorry for the spam), because of the force push I had to do when switching the PR to v4, but it felt like a required step as well ⚡

Also added a small unit test

@fopina fopina requested a review from yvanoers February 11, 2024 17:48
@fopina
Copy link
Contributor Author

fopina commented Jun 8, 2024

@yvanoers any chance for review? :)

@fopina
Copy link
Contributor Author

fopina commented Sep 12, 2024

is it worth the effort to rebase this PR with v4?

@yvanoers
Copy link
Collaborator

is it worth the effort to rebase this PR with v4?

Isn't it already targeted at v4 ?

@fopina
Copy link
Contributor Author

fopina commented Sep 12, 2024

No, I don’t think there was a v4 when this branch was created

@yvanoers
Copy link
Collaborator

Well, you changed the base from main to 4.x 7 months ago and did a couple force pushes. So if you didn't rebase to the 4.x branch back then already, I would think is does make sense to rebase it.
The changeset is still 3 files, though, so I'm not worried it will not properly merge if it isn't rebased.

@vcastellm
Copy link
Member

vcastellm commented Sep 14, 2024

Hey this somehow keeps escaping my radar sorry, I'll take a look and get back to you, it looks like it'll merge right away.

@fopina are you already using v4?

@fopina
Copy link
Contributor Author

fopina commented Sep 14, 2024

@yvanoers you're right, i probably rebased and can’t remember

@vcastellm looks like it indeed, thanks!

Copy link
Member

@vcastellm vcastellm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fopina
Copy link
Contributor Author

fopina commented Sep 15, 2024

right, sorry for missing that @vcastellm, done now 👍

@@ -51,6 +51,10 @@ Question mark ( ? )
Question mark may be used instead of '*' for leaving either day-of-month or
day-of-week blank.

Tilde ( ~ )

Tilde will be replaced by a numeric value valid for the range where it is used. It allows periodically scheduled tasks to produce even load on the system. For example, scheduling multiple hourly jobs to "0 H * * * *" rather than "0 0 * * * *" will run the jobs at different minutes of every hour. It can be thought of as a random value over a range, but it actually is a hash of the job name, not a random function, so that the value remains stable for any given job.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "0 ~ * * * *"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 inspired myself in jenkins docs and forgot to replace!

@vcastellm vcastellm merged commit ebcb366 into distribworks:4.x Sep 16, 2024
1 check passed
fopina added a commit to fopina/dkron that referenced this pull request Sep 16, 2024
vcastellm added a commit that referenced this pull request Oct 6, 2024
* Convert shell plugin to internal plugin (#975)

The purpose of this PR is to embed the shell plugin in the main dkron binary, that will facilitate creating a single binary with the most important executor, for easy deployment using a single binary.

* Remove old UI (#984)
* Release a light image tag (#988)

Omit all plugins except the shell plugin that will be included in the main binary.

* Upgrade react admin to v4 (#1436)

* Use exponential backoff for retries (#1433)

* Handle ip changes (#1446)
* consul like approach: add server_lookup to make dkron independent from server node IP on raft layer
* handle memberupdate event
* query other servers before start & add test
* don't remove itself if node is a raft leader
* don't remove dkron server node if id matches

* Move config init to agent command (#1465)

Config init was being done on the root level command but only the agent command was using config values.

Now config init is done as pre-run of agent command only, getting rid of extra messages in other commands when the config was missing.

* Fix disabled (#1467)
* Embed http plugin (#1471)

http plugin is one of the most used plugins together with the shell plugin, embedding it in the main binary allow for more lean deployment.

* Show all Job fields
* Refactor buttons for admin v4

* Docs v4 (#1473)

* Reuse http clients with the same configuration (#1474)
* Update OpenAPI spec to v3.1.0 and add ACLs spec
* Refactor location of types as they not only belong to plugins, but used in general. (#1485)

Also add Pro protobuf and gen code as this should be public.

* Generate client
* Extend protobuf defs for ACLs
* Token fields
* Minor improvement in RabbitMQ executor (#1500)
* nice and forkable goreleaser (#1584)
* add support for hash scheduling (#1260)
* feat: Login form and admin update (#1589)

Upgrade react-admin to v5
Implements a login form to use in Pro version

---------

Co-authored-by: Victor Castell <[email protected]>
Co-authored-by: Ivan Kripakov <[email protected]>
Co-authored-by: Filipe Pina <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants