Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add accelerator value to job cfg and extend PR comment if accelerator arg is used #280

Merged
merged 9 commits into from
Sep 17, 2024
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -645,17 +645,24 @@ scontrol_command = /usr/bin/scontrol
#### `[submitted_job_comments]` section

The `[submitted_job_comments]` section specifies templates for messages about newly submitted jobs.
```
initial_comment = New job on instance `{app_name}` for architecture `{arch_name}` for repository `{repo_id}` in job dir `{symlink}`
```
`initial_comment` is used to create a comment to a PR when a new job has been created.

```
awaits_release = job id `{job_id}` awaits release by job manager
```
`awaits_release` is used to provide a status update of a job (shown as a row in the job's status
table).

```
initial_comment = New job on instance `{app_name}` for architecture `{arch_name}`{accelerator_spec} for repository `{repo_id}` in job dir `{symlink}`
```
`initial_comment` is used to create a comment to a PR when a new job has been
created. Note, the part '{accelerator_spec}' is only filled-in by the bot if the
argument 'accelerator' to the `bot: build` command have been used.
trz42 marked this conversation as resolved.
Show resolved Hide resolved
```
with_accelerator =  and accelerator `{accelerator}`
```
`with_accelerator` is used to provide information about the accelerator the job
should build for if and only if the argument `accelerator:X/Y` has been provided.

#### `[new_job_comments]` section

The `[new_job_comments]` section sets templates for messages about jobs whose `hold` flag was released.
Expand Down
3 changes: 2 additions & 1 deletion app.cfg.example
Original file line number Diff line number Diff line change
Expand Up @@ -243,8 +243,9 @@ scontrol_command = /usr/bin/scontrol
# are removed, the output (in PR comments) will lack important
# information.
[submitted_job_comments]
initial_comment = New job on instance `{app_name}` for architecture `{arch_name}` for repository `{repo_id}` in job dir `{symlink}`
awaits_release = job id `{job_id}` awaits release by job manager
initial_comment = New job on instance `{app_name}` for CPU micro-architecture `{arch_name}`{accelerator_spec} for repository `{repo_id}` in job dir `{symlink}`
with_accelerator =  and accelerator `{accelerator}`


[new_job_comments]
Expand Down
42 changes: 36 additions & 6 deletions tasks/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
# Local application imports (anything from EESSI/eessi-bot-software-layer)
from connections import github
from tools import config, cvmfs_repository, job_metadata, pr_comments, run_cmd
import tools.filter as tools_filter


# defaults (used if not specified via, eg, 'app.cfg')
Expand All @@ -46,7 +47,7 @@
_ERROR_NONE = "none"


Job = namedtuple('Job', ('working_dir', 'arch_target', 'repo_id', 'slurm_opts', 'year_month', 'pr_id'))
Job = namedtuple('Job', ('working_dir', 'arch_target', 'repo_id', 'slurm_opts', 'year_month', 'pr_id', 'accelerator'))

# global repo_cfg
repo_cfg = {}
Expand Down Expand Up @@ -474,6 +475,12 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
# call to just before download_pr
year_month, pr_id, run_dir = create_pr_dir(pr, cfg, event_info)

accelerator = "none"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use "none" instead of None here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use None. Just need to change the processing where it is used as a string and we want a meaningful output, e.g., in a log or in a PR comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 8f44ba9

# determine accelerator from action_filter argument
accelerators = action_filter.get_filter_by_component(tools_filter.FILTER_COMPONENT_ACCEL)
if len(accelerators) > 0:
accelerator = accelerators[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we warn or something when the list has more than 1 element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Logging is maybe sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8f44ba9

Also logs in case there is no element.


jobs = []
for arch, slurm_opt in arch_map.items():
arch_dir = arch.replace('/', '_')
Expand Down Expand Up @@ -501,6 +508,15 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
continue
else:
log(f"{fn}(): context DOES satisfy filter(s), going on with job")
# we reached this point when the filter matched (otherwise we
# 'continue' with the next repository)
# for each match of the filter we create a specific job directory
# however, matching CPU architectures works differently to handling
# accelerators; multiple CPU architectures defined in arch_target_map
# can match the (CPU) architecture component of a filter; in
# contrast, the value of the accelerator filter is just passed down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this rather limiting?

How could we then send GPU builds to a GPU node/partition, when desired?

Copy link
Contributor Author

@trz42 trz42 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a separate argument for that (e.g., node: or nodetype:) or rework the handling of arguments, e.g., have arguments that are passed down to the build script and arguments that are used by the bot to allocate the right node type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to not add this capability in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, agreed

# to scripts in bot/ directory of the pull request (see function
# prepare_job_cfg and creation of Job tuple below)
job_dir = os.path.join(run_dir, arch_dir, repo_id)
os.makedirs(job_dir, exist_ok=True)
log(f"{fn}(): job_dir '{job_dir}'")
Expand All @@ -514,10 +530,12 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
cpu_target = '/'.join(arch.split('/')[1:])
os_type = arch.split('/')[0]
log(f"{fn}(): arch = '{arch}' => cpu_target = '{cpu_target}' , os_type = '{os_type}'")
prepare_job_cfg(job_dir, build_env_cfg, repocfg, repo_id, cpu_target, os_type)

log(f"{fn}(): accelerator = '{accelerator}'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not log this via log statement on line 516?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Changed in 8f44ba9

prepare_job_cfg(job_dir, build_env_cfg, repocfg, repo_id, cpu_target, os_type, accelerator)

# enlist jobs to proceed
job = Job(job_dir, arch, repo_id, slurm_opt, year_month, pr_id)
job = Job(job_dir, arch, repo_id, slurm_opt, year_month, pr_id, accelerator)
jobs.append(job)

log(f"{fn}(): {len(jobs)} jobs to proceed after applying white list")
Expand All @@ -527,7 +545,7 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
return jobs


def prepare_job_cfg(job_dir, build_env_cfg, repos_cfg, repo_id, software_subdir, os_type):
def prepare_job_cfg(job_dir, build_env_cfg, repos_cfg, repo_id, software_subdir, os_type, accelerator):
"""
Set up job configuration file 'job.cfg' in directory <job_dir>/cfg

Expand All @@ -538,6 +556,7 @@ def prepare_job_cfg(job_dir, build_env_cfg, repos_cfg, repo_id, software_subdir,
repo_id (string): identifier of the repository to build for
software_subdir (string): software subdirectory to build for (e.g., 'x86_64/generic')
os_type (string): type of the os (e.g., 'linux')
accelerator (string): defines accelerator to build for (e.g., 'nvidia/cc80')

Returns:
None (implicitly)
Expand All @@ -563,6 +582,7 @@ def prepare_job_cfg(job_dir, build_env_cfg, repos_cfg, repo_id, software_subdir,
# [architecture]
# software_subdir = software_subdir
# os_type = os_type
# accelerator = accelerator
job_cfg = configparser.ConfigParser()
job_cfg[job_metadata.JOB_CFG_SITE_CONFIG_SECTION] = {}
build_env_to_job_cfg_keys = {
Expand Down Expand Up @@ -607,6 +627,7 @@ def prepare_job_cfg(job_dir, build_env_cfg, repos_cfg, repo_id, software_subdir,
job_cfg[job_cfg_arch_section] = {}
job_cfg[job_cfg_arch_section][job_metadata.JOB_CFG_ARCHITECTURE_SOFTWARE_SUBDIR] = software_subdir
job_cfg[job_cfg_arch_section][job_metadata.JOB_CFG_ARCHITECTURE_OS_TYPE] = os_type
job_cfg[job_cfg_arch_section][job_metadata.JOB_CFG_ARCHITECTURE_ACCELERATOR] = accelerator

# copy contents of directory containing repository configuration to directory
# containing job configuration/metadata
Expand Down Expand Up @@ -726,11 +747,19 @@ def create_pr_comment(job, job_id, app_name, pr, gh, symlink):
# obtain arch from job.arch_target which has the format OS/ARCH
arch_name = '-'.join(job.arch_target.split('/')[1:])

submitted_job_comments_cfg = config.read_config()[config.SECTION_SUBMITTED_JOB_COMMENTS]

# obtain accelerator from job.accelerator
accelerator = job.accelerator
accelerator_spec_str = ''
if accelerator != 'none':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we intend to keep 'none' as a magic special value, we should make it a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some value that represents that no accelerator build is requested. Maybe just 'none' is not good enough. Could be NO_ACCELERATOR or similar.

Agree, a constant would be best.

Copy link
Contributor Author

@trz42 trz42 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather used None than a constant. It may require different processing in the bot/* scripts, but that's ok as those haven't been implemented yet.

Done in 8f44ba9

accelerator_spec = f"{submitted_job_comments_cfg[config.SUBMITTED_JOB_COMMENTS_SETTING_WITH_ACCELERATOR]}"
accelerator_spec_str = accelerator_spec.format(accelerator=accelerator)

# get current date and time
dt = datetime.now(timezone.utc)

# construct initial job comment
submitted_job_comments_cfg = config.read_config()[config.SECTION_SUBMITTED_JOB_COMMENTS]
job_comment = (f"{submitted_job_comments_cfg[config.SUBMITTED_JOB_COMMENTS_SETTING_INITIAL_COMMENT]}"
f"\n|date|job status|comment|\n"
f"|----------|----------|------------------------|\n"
Expand All @@ -741,7 +770,8 @@ def create_pr_comment(job, job_id, app_name, pr, gh, symlink):
arch_name=arch_name,
symlink=symlink,
repo_id=job.repo_id,
job_id=job_id)
job_id=job_id,
accelerator_spec=accelerator_spec_str)

# create comment to pull request
repo_name = pr.base.repo.full_name
Expand Down
3 changes: 2 additions & 1 deletion tools/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,9 @@
RUNNING_JOB_COMMENTS_SETTING_RUNNING_JOB = 'running_job'

SECTION_SUBMITTED_JOB_COMMENTS = 'submitted_job_comments'
SUBMITTED_JOB_COMMENTS_SETTING_INITIAL_COMMENT = 'initial_comment'
SUBMITTED_JOB_COMMENTS_SETTING_AWAITS_RELEASE = 'awaits_release'
SUBMITTED_JOB_COMMENTS_SETTING_INITIAL_COMMENT = 'initial_comment'
SUBMITTED_JOB_COMMENTS_SETTING_WITH_ACCELERATOR = 'with_accelerator'

SECTION_CLEAN_UP = 'clean_up'
CLEAN_UP_SETTING_TRASH_BIN_ROOT_DIR = 'trash_bin_dir'
Expand Down
21 changes: 21 additions & 0 deletions tools/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
COMPONENT_UNKNOWN = "unknown component={component} in {component}:{pattern}"
FILTER_EMPTY_PATTERN = "pattern in filter string '{filter_string}' is empty"
FILTER_FORMAT_ERROR = "filter string '{filter_string}' does not conform to format 'component:pattern'"
UNKNOWN_COMPONENT_CONST = "unknown component constant {component}"

Filter = namedtuple('Filter', ('component', 'pattern'))

Expand Down Expand Up @@ -175,6 +176,26 @@ def add_filter_from_string(self, filter_string):
raise EESSIBotActionFilterError(msg)
self.add_filter(_filter_split[0], _filter_split[1])

def get_filter_by_component(self, component):
"""
Returns filter pattern for component.

Args:
component (string): one of FILTER_COMPONENTS

Returns:
(list): list of pattern for filters whose component matches argument
"""
if component not in FILTER_COMPONENTS:
msg = UNKNOWN_COMPONENT_CONST.format(component=component)
raise EESSIBotActionFilterError(msg)

pattern = []
for _filter in self.action_filters:
if component == _filter.component:
pattern.append(_filter.pattern)
return pattern

def remove_filter(self, component, pattern):
"""
Removes all elements matching the filter given by (component, pattern)
Expand Down
1 change: 1 addition & 0 deletions tools/job_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
JOB_CFG_ARCHITECTURE_SECTION = "architecture"
JOB_CFG_ARCHITECTURE_OS_TYPE = "os_type"
JOB_CFG_ARCHITECTURE_SOFTWARE_SUBDIR = "software_subdir"
JOB_CFG_ARCHITECTURE_ACCELERATOR = "accelerator"

JOB_CFG_REPOSITORY_SECTION = "repository"
JOB_CFG_REPOSITORY_CONTAINER = "container"
Expand Down
Loading