add accelerator value to job cfg and extend PR comment if accelerator arg is used #280

trz42 · 2024-09-10T09:54:05Z

Passes down value of accelerator to job.cfg file and uses value in comment about a new job.

README.md

boegel · 2024-09-10T11:57:43Z

tasks/build.py

+            #   however, matching CPU architectures works differently to handling
+            #   accelerators; multiple CPU architectures defined in arch_target_map
+            #   can match the (CPU) architecture component of a filter; in
+            #   contrast, the value of the accelerator filter is just passed down


Isn't this rather limiting?

How could we then send GPU builds to a GPU node/partition, when desired?

We need a separate argument for that (e.g., node: or nodetype:) or rework the handling of arguments, e.g., have arguments that are passed down to the build script and arguments that are used by the bot to allocate the right node type.

I'd suggest to not add this capability in this PR.

boegel · 2024-09-10T11:58:08Z

tasks/build.py

@@ -514,10 +530,12 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
            cpu_target = '/'.join(arch.split('/')[1:])
            os_type = arch.split('/')[0]
            log(f"{fn}(): arch = '{arch}' => cpu_target = '{cpu_target}' , os_type = '{os_type}'")
-            prepare_job_cfg(job_dir, build_env_cfg, repocfg, repo_id, cpu_target, os_type)
+
+            log(f"{fn}(): accelerator = '{accelerator}'")


Why not log this via log statement on line 516?

Ack. Changed in 8f44ba9

boegel · 2024-09-10T11:59:49Z

tasks/build.py

+    # obtain accelerator from job.accelerator
+    accelerator = job.accelerator
+    accelerator_spec_str = ''
+    if accelerator != 'none':


If we intend to keep 'none' as a magic special value, we should make it a constant?

We need some value that represents that no accelerator build is requested. Maybe just 'none' is not good enough. Could be NO_ACCELERATOR or similar.

Agree, a constant would be best.

Rather used None than a constant. It may require different processing in the bot/* scripts, but that's ok as those haven't been implemented yet.

Done in 8f44ba9

boegel · 2024-09-10T12:00:17Z

tasks/build.py

@@ -474,6 +475,12 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
    #      call to just before download_pr
    year_month, pr_id, run_dir = create_pr_dir(pr, cfg, event_info)

+    accelerator = "none"


Why use "none" instead of None here?

We could use None. Just need to change the processing where it is used as a string and we want a meaningful output, e.g., in a log or in a PR comment.

Implemented in 8f44ba9

boegel · 2024-09-10T12:02:55Z

tasks/build.py

+    # determine accelerator from action_filter argument
+    accelerators = action_filter.get_filter_by_component(tools_filter.FILTER_COMPONENT_ACCEL)
+    if len(accelerators) > 0:
+        accelerator = accelerators[0]


Should we warn or something when the list has more than 1 element?

Yep. Logging is maybe sufficient.

Done in 8f44ba9

Also logs in case there is no element.

Co-authored-by: Kenneth Hoste <[email protected]>

trz42 · 2024-09-10T15:04:17Z

Looks like some tests are failing now. Will look into these and address the suggestions.

trz42 · 2024-09-10T20:00:29Z

All suggestions addressed. Pytests have been fixed. Bot build command has been tested both with and without additional argument accelerator.

boegel

lgtm

boegel · 2024-09-17T12:53:50Z

Tested myself, works as designed. Diff for job.cfg created by a "build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2" request vs "build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80":

$ diff -u 18855/cfg/job.cfg 18856/cfg/job.cfg
--- 18855/cfg/job.cfg	2024-09-17 12:42:14.738160934 +0000
+++ 18856/cfg/job.cfg	2024-09-17 12:42:55.348634381 +0000
@@ -5,7 +5,7 @@
 shared_fs_path = /home/boegel/bot-shared

 [repository]
-repos_cfg_dir = /home/boegel/eessi-bot-software-layer/jobs/2024.09/pr_32/event_405625f0-74f2-11ef-8f66-1dae4afc0d3b/run_000/linux_x86_64_amd_zen2/eessi.io-2023.06-software/cfg
+repos_cfg_dir = /home/boegel/eessi-bot-software-layer/jobs/2024.09/pr_32/event_591068d0-74f2-11ef-9bca-f787718ea34f/run_000/linux_x86_64_amd_zen2/eessi.io-2023.06-software/cfg
 repo_id = eessi.io-2023.06-software
 container = docker://ghcr.io/eessi/build-node:debian11
 repo_name = software.eessi.io
@@ -14,5 +14,5 @@
 [architecture]
 software_subdir = x86_64/amd/zen2
 os_type = linux
-accelerator =
+accelerator = nvidia/cc80

Relevant log entries:

$ grep 'accelerator =' pyghee.log
[20240917-T12:42:14] prepare_jobs(): arch = 'linux/x86_64/amd/zen2' => cpu_target = 'x86_64/amd/zen2' , os_type = 'linux', accelerator = 'None'
accelerator =
[20240917-T12:42:55] prepare_jobs(): arch = 'linux/x86_64/amd/zen2' => cpu_target = 'x86_64/amd/zen2' , os_type = 'linux', accelerator = 'nvidia/cc80'
accelerator = nvidia/cc80

truib added 4 commits September 10, 2024 10:34

add function to return filter pattern for a component

cef70a7

updated and new settings to support accelerator build arg

96b067d

fix comparison syntax

815f8fe

improve settings' value for with_accelerator

9e4cbfd

boegel requested changes Sep 10, 2024

View reviewed changes

fix typo in README.md

34534bb

Co-authored-by: Kenneth Hoste <[email protected]>

truib added 4 commits September 10, 2024 21:11

using None for undefined accelerator, improved logging

8f44ba9

fix failing pytests for extended Job tuple

216e2db

adjust test data (app.cfg)

35dac28

cfg values cannot be None

687c256

boegel approved these changes Sep 17, 2024

View reviewed changes

boegel merged commit 7733f23 into EESSI:develop Sep 17, 2024
7 checks passed

This was referenced Sep 17, 2024

{2023.06}[2023a] beagle-lib v4.0.1 w/ CUDA 12.1.1 boegel/software-layer#32

Closed

mark with_accelerator setting in [submitted_job_comments] section as required #282

Merged

release notes for v0.6.0 #284

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add accelerator value to job cfg and extend PR comment if accelerator arg is used #280

add accelerator value to job cfg and extend PR comment if accelerator arg is used #280

trz42 commented Sep 10, 2024

boegel Sep 10, 2024

trz42 Sep 10, 2024 •

edited

Loading

trz42 Sep 10, 2024

boegel Sep 17, 2024

boegel Sep 10, 2024

trz42 Sep 10, 2024

boegel Sep 10, 2024

trz42 Sep 10, 2024

trz42 Sep 10, 2024 •

edited

Loading

boegel Sep 10, 2024

trz42 Sep 10, 2024

trz42 Sep 10, 2024

boegel Sep 10, 2024

trz42 Sep 10, 2024

trz42 Sep 10, 2024

trz42 commented Sep 10, 2024

trz42 commented Sep 10, 2024

boegel left a comment

boegel commented Sep 17, 2024

add accelerator value to job cfg and extend PR comment if accelerator arg is used #280

add accelerator value to job cfg and extend PR comment if accelerator arg is used #280

Conversation

trz42 commented Sep 10, 2024

Choose a reason for hiding this comment

trz42 Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trz42 Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trz42 commented Sep 10, 2024

trz42 commented Sep 10, 2024

boegel left a comment

Choose a reason for hiding this comment

boegel commented Sep 17, 2024

trz42 Sep 10, 2024 •

edited

Loading

trz42 Sep 10, 2024 •

edited

Loading