Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link built pool to image in ACR #59

Closed
zsusswein opened this issue Oct 3, 2024 · 9 comments · Fixed by #89
Closed

Link built pool to image in ACR #59

zsusswein opened this issue Oct 3, 2024 · 9 comments · Fixed by #89
Assignees

Comments

@zsusswein
Copy link
Collaborator

zsusswein commented Oct 3, 2024

#43 and #54 create the infrastructure to build and push the pipeline to ACR in an image. It also creates a pool with the same tag. But these images and pools aren't actually linked as far as we can tell. We need some additional formatting to link the pools to the image.

As part of this issue, it would also be good to bring over any missing settings from the existing approach.

cc: @gvegayon please add any missing color here!

Possible solution steps (initial draft by @gvegayon)

  • Refactor how the az batch pool is instantiated using --template, so configuration is passed via a JSON file.
  • Besides the current parameters (which are passed via environment variables), we need to pass the containerConfiguration via deploymentConfiguration to the argument --parameters.
  • To ensure this is running as expected, we need to submit a task that prints the information about the operating system and the R package (which should be available in the image).
@jkislin
Copy link
Contributor

jkislin commented Oct 10, 2024

@zsusswein what exactly does 'linking' entail? In other words, what functionality do you want? Is tagging the pools and images with the same hash or id enough, or am I missing something else?

Sorry if this is a really facile question!

@zsusswein
Copy link
Collaborator Author

This is a great q.

We need the nodes in the Batch pool to run our code. George's and your prior work takes the code here, builds the package in a Docker image, and stores the image in ACR.

But I don't think the Batch pools created here specify running that particular Docker image in ACR. They specify:

VM_IMAGE_TAG: "canonical:0001-com-ubuntu-server-focal:20_04-lts"

We add some additional keys in our current config to specify the container configuration:

https://github.com/cdcent/cfa-nnh-pipelines/blob/33b4a55daba3479cddc85fe10dac4732b2f2c91b/NHSN/Rt/run_azure_batch/create_expt_pool.py#L68-L78

I believe we need to do something similar here.

@jkislin
Copy link
Contributor

jkislin commented Oct 10, 2024

Got it!

@gvegayon , if you have the cycles (let me know if not), what we essentially need to do here is create a new job within the deployment workflow to submit Azure Batch Jobs and Tasks. The current process is in the cfa-nnh-pipelines, and the syntax is quite nested, quite convoluted, and in Python. What we need instead is a series of shell-based az batch <> commands to replace these leviathans, no python necessary.

  1. Take a look at the SOP Patrick and Kingsley use - this gives you a sense of the current order of operations for both setting up pools (something we've already done here, but worth looking to see if we missed any juicy config deets) and submitting jobs:

  2. Perform some 'code archaeology' in the old cfa-nnh-pipelines repo. The shell scripts that do what we need to replicate are here:

Some other notes:

  • Anything about logging into Azure is already handled up front by our SP, so we don't need to replicate anything with app ids, principal ids, client ids, etc. You can safely ignore those details as they're artifacts of logging in a different way via the VAP Desktop.
    • There may be some places that require managed identities. Basically, once a Service Principal triggers a batch job or pool, that pool might need its own access to use Azure Blob, etc, because at that point its already been handed off and the SP is no longer performing the operation, but rather the Batch Account.
    • This should be internally managed, and I don't think we need to code this out. If you get any errors or confusion there, this is probably a job for Amit and me on the CFA Tools side to smooth out, rather than something for us to code here. We can chat.

@zsusswein zsusswein assigned gvegayon and unassigned jkislin Oct 10, 2024
@zsusswein
Copy link
Collaborator Author

@gvegayon ping me if helpful to talk through scope here. If this issue seems like it's getting unwieldy, let's split it into chunks of work.

@zsusswein
Copy link
Collaborator Author

Finally found some docs explaining what we want. I don't have time to finish them now, but dumping here to come back for another read.

@gvegayon
Copy link
Member

Finally found some docs explaining what we want. I don't have time to finish them now, but dumping here to come back for another read.

A couple of other references:

@gvegayon
Copy link
Member

So, we have reached a roadblock: The creation of batch pools using the --template argument retired this year in September (here). In particular, the az cli extension that gave that capability. Looking around, I believe the best would be to use the Python script that exists for this project (here). We should probably have a chat about this, @zsusswein, @jkislin, @natemcintosh, and @dylanhmorris.

@natemcintosh
Copy link
Collaborator

To clarify, the issue here is in attaching an ACR image to a pool at some arbitrary time, that is not necessarily pool build time?

@gvegayon
Copy link
Member

To clarify, the issue here is in attaching an ACR image to a pool at some arbitrary time, that is not necessarily pool build time?

I was thinking during build time. But that's the image itself. I am unsure when the image is actually downloaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants