-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add accelerator value to job cfg and extend PR comment if accelerator arg is used #280
Conversation
# however, matching CPU architectures works differently to handling | ||
# accelerators; multiple CPU architectures defined in arch_target_map | ||
# can match the (CPU) architecture component of a filter; in | ||
# contrast, the value of the accelerator filter is just passed down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this rather limiting?
How could we then send GPU builds to a GPU node/partition, when desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a separate argument for that (e.g., node:
or nodetype:
) or rework the handling of arguments, e.g., have arguments that are passed down to the build script and arguments that are used by the bot to allocate the right node type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to not add this capability in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, agreed
tasks/build.py
Outdated
@@ -514,10 +530,12 @@ def prepare_jobs(pr, cfg, event_info, action_filter): | |||
cpu_target = '/'.join(arch.split('/')[1:]) | |||
os_type = arch.split('/')[0] | |||
log(f"{fn}(): arch = '{arch}' => cpu_target = '{cpu_target}' , os_type = '{os_type}'") | |||
prepare_job_cfg(job_dir, build_env_cfg, repocfg, repo_id, cpu_target, os_type) | |||
|
|||
log(f"{fn}(): accelerator = '{accelerator}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not log this via log statement on line 516?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Changed in 8f44ba9
tasks/build.py
Outdated
# obtain accelerator from job.accelerator | ||
accelerator = job.accelerator | ||
accelerator_spec_str = '' | ||
if accelerator != 'none': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we intend to keep 'none'
as a magic special value, we should make it a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some value that represents that no accelerator build is requested. Maybe just 'none' is not good enough. Could be NO_ACCELERATOR
or similar.
Agree, a constant would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather used None
than a constant. It may require different processing in the bot/*
scripts, but that's ok as those haven't been implemented yet.
Done in 8f44ba9
tasks/build.py
Outdated
@@ -474,6 +475,12 @@ def prepare_jobs(pr, cfg, event_info, action_filter): | |||
# call to just before download_pr | |||
year_month, pr_id, run_dir = create_pr_dir(pr, cfg, event_info) | |||
|
|||
accelerator = "none" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use "none"
instead of None
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use None
. Just need to change the processing where it is used as a string and we want a meaningful output, e.g., in a log or in a PR comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in 8f44ba9
# determine accelerator from action_filter argument | ||
accelerators = action_filter.get_filter_by_component(tools_filter.FILTER_COMPONENT_ACCEL) | ||
if len(accelerators) > 0: | ||
accelerator = accelerators[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we warn or something when the list has more than 1 element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Logging is maybe sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8f44ba9
Also logs in case there is no element.
Co-authored-by: Kenneth Hoste <[email protected]>
Looks like some tests are failing now. Will look into these and address the suggestions. |
All suggestions addressed. Pytests have been fixed. Bot build command has been tested both with and without additional argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Tested myself, works as designed. Diff for $ diff -u 18855/cfg/job.cfg 18856/cfg/job.cfg
--- 18855/cfg/job.cfg 2024-09-17 12:42:14.738160934 +0000
+++ 18856/cfg/job.cfg 2024-09-17 12:42:55.348634381 +0000
@@ -5,7 +5,7 @@
shared_fs_path = /home/boegel/bot-shared
[repository]
-repos_cfg_dir = /home/boegel/eessi-bot-software-layer/jobs/2024.09/pr_32/event_405625f0-74f2-11ef-8f66-1dae4afc0d3b/run_000/linux_x86_64_amd_zen2/eessi.io-2023.06-software/cfg
+repos_cfg_dir = /home/boegel/eessi-bot-software-layer/jobs/2024.09/pr_32/event_591068d0-74f2-11ef-9bca-f787718ea34f/run_000/linux_x86_64_amd_zen2/eessi.io-2023.06-software/cfg
repo_id = eessi.io-2023.06-software
container = docker://ghcr.io/eessi/build-node:debian11
repo_name = software.eessi.io
@@ -14,5 +14,5 @@
[architecture]
software_subdir = x86_64/amd/zen2
os_type = linux
-accelerator =
+accelerator = nvidia/cc80 Relevant log entries:
|
Passes down value of
accelerator
tojob.cfg
file and uses value in comment about a new job.