Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hipercow_resources when running tasks #63

Merged
merged 16 commits into from
Jan 16, 2024
Merged

Use hipercow_resources when running tasks #63

merged 16 commits into from
Jan 16, 2024

Conversation

weshinsley
Copy link
Contributor

@weshinsley weshinsley commented Jan 12, 2024

A hipercow_resource can now be passed into all the task_create functions, and ultimately task_submit - OR - it can be left NULL and a working default will be picked up.

task_retry is a bit interesting - I haven't done anthing special.

Also - I have one mockery I couldn't get working in test-interface - I wanted to check that special_time got called once within task_submit - and I'm sure it does from manual browser()-ing - but I've done something wrong with the mock and it doesn't count the call...

@weshinsley weshinsley marked this pull request as draft January 12, 2024 14:01
@weshinsley weshinsley changed the title Mrc 4929 Use hipercow_resources when running tasks Jan 12, 2024
@weshinsley weshinsley marked this pull request as ready for review January 12, 2024 22:04
Copy link
Member

@richfitz richfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick review to give you something to get on with on this in case I'm off monday

R/resource.R Show resolved Hide resolved
R/resource.R Show resolved Hide resolved
R/task-create.R Outdated Show resolved Hide resolved
dat <- hipercow_driver_prepare(driver, root, environment())

if (!is.null(resources$hold_until$computed)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this could be done at validation, unless someone sits on their resources a long time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in two minds about this. I wonder if people might create a couple of resource objects for their "test_job" and their "real_job" and their "massive_job_that_always_waits_til_midnight", and then have those as kind of ready-made resources to launch their jobs in the same way as last time, without having to think about it.

I am not sure whether this is terribly likely, and might not be totally advisable - once in a while, available resources might change (but then they will need a new package version and will have to rerun their resource things anyway).

But also, I'm not sure doing the translation during submission causes any real problem does it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only "problem" caused by doing it at submission is that there's two places to do the validation really. In particular, the validation here must never fail or it will be quite annoying for the user. We can see how this goes in practice if you want - I expect this flow will need some adjustment when it comes in contact with users, no matter what we pick.

We might sketch out a flow of validation etc as we complicate this too (environment variables and parallel configuration will all interact here too)

R/task-retry.R Show resolved Hide resolved
R/util.R Outdated Show resolved Hide resolved
drivers/windows/R/driver.R Show resolved Hide resolved
@@ -6,6 +6,7 @@ windows_cluster_info <- function(config, path_root) {
max_cores = 32,
max_ram = 512,
queues = c("AllNodes", "Training"),
default_queue = "AllNodes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, have the first element of queues be default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - I think I maybe prefer saying explicitly default_queue than queues[1] but not strongly so...

R/resource.R Show resolved Hide resolved
@@ -281,7 +294,7 @@ hipercow_resources_validate <- function(resources,
validate_cluster_requested_nodes(
resources$requested_nodes$computed, cluster_info$nodes)

TRUE
resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. By doing this (with the assignment of resources$queue$computed) we do open the door to the users sneaking a different queue through here if they are sufficiently motivated. Perhaps that's ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the only thing that gets changed here in the return is the queue gets default-ised if it was null - which would be good to do somewhere - I prefer that to having the user to explicitly talk about the queue.

I wonder if along with the discussion on when to turn "tonight" into a real time etc.... perhaps we really do have two stages of valdation here, the first one for all the syntactic stuff, and theoretical checks, and the second at last minute when we translate the timings, fill in the queue with a default if it was left NULL, and flag the sort of failures we'd only see when we consider the actual running of the job on an actual cluster...

For conversation later!

dat <- hipercow_driver_prepare(driver, root, environment())

if (!is.null(resources$hold_until$computed)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only "problem" caused by doing it at submission is that there's two places to do the validation really. In particular, the validation here must never fail or it will be quite annoying for the user. We can see how this goes in practice if you want - I expect this flow will need some adjustment when it comes in contact with users, no matter what we pick.

We might sketch out a flow of validation etc as we complicate this too (environment variables and parallel configuration will all interact here too)

resources = res)),
"Submitted task '[[:xdigit:]]{32}' using 'elsewhere'")

# TODO HERE - Something is wrong with my mockery - mock_special_time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll talk you through this in person, it's an unfortunate consequence of how mockery works.

Usually some contortions are required here to make good tests

@richfitz richfitz merged commit 8cf97c0 into main Jan 16, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants