Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom notebooks #150
Custom notebooks #150
Changes from 10 commits
3d4d9cf
8c8f715
1d5164f
85b50a0
e33f841
914bdc6
fc2e504
04b46a9
a3d5a24
27e5c7f
a91a145
a081a7c
f19af5e
51846fd
f8e2b12
55234f3
31d5027
fe7348c
cfe4634
5c1e56a
c934e3d
927f090
fe79bc5
8990073
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear here that the yaml config file must be in the image. This is the case right?
Also the file is expected to be located to a particular location : https://github.com/bird-house/birdhouse-deploy/pull/150/files#diff-17e9b2e274b97a022f797d1f221f2b50144c0ce3b70537a9faa2b7412b2a2cafR159
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in our case, we run this script on an image, so the config must be on the same image, yes. But, I suppose the script could be used directly on the host, without using any image. Since our use case here is to run it on a docker image, I should just put a clarification that the config file should be on the image too though. :P
And, the script has to be copied, when building the docker image, to the same path that will be written for the job command (like on the link you put here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum ok, I agree that this script doesn't mind where the config file is located, it is passed as an argument.
In fact, what is needed is a better place to document the location of the config file in images. All images will require this config file at a specific location so that the job (my link) can find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Download from github using
svn
client? I didn't know that it could be done! But why not just use agit
client?The bigger question is this script seems to do a very similar work as the existing
deploy-data
script. Not clear what feature is missing but why we can not re-use the existingdeploy-data
script? If a feature is missing, can we add it instead of forking another script?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do have a lot of similarities. I could give it a try and try using the deploy-data script directly, but I am scared of finding a blocking point sometime if we want to add different features to one of the use case.
They are almost similar though, if not the same, for now.
Also, I see that the deploy-data script requires the use of Docker, which means we have to install Docker on the pavics-jupyter-base (this would replace the yq/jq installation actually). I don't know what is the best here, if we want to keep those images to the bare minimum.
@dbyrns I would be curious about your input on this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have enough information on what @tlvu would like you to reuse to make that call. If he could show us how it could easily be done I'm not against it. As long as we don't have to invest another couple of days to do that. The same apply to the other reuse request, maybe you could talk directly to each other so that it could be done fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main difference between our scripts is that the
deploy-data
script is meant to run on a generic image that includes Docker and Git, but our new scriptdeploy-data-specific-image
is meant to be run directly on one of our own specific image, which includes our config.yaml file.A benefit of using the new script is that it lets us have the yaml file directly on the specific image's repo, keeping it close to its related context. For example, this means, if a developer wants to include a new folder for the crim-eo environment, he just goes to the crim-eo's repo and changes the config there.
I am not sure if we could do this easily with the
deploy-data
script? I think the yaml files are stored directly on the birdhouse-deploy repo? Such as : deploy_raven_testdata_to_thredds.env and deploy-data-raven-testdata-to-thredds.ymlThey are then added as a job in the env.local file :
birdhouse-deploy/birdhouse/env.local.example
Line 177 in 57a320c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I have more time to digest this change. I would suggest moving this
deploy-data-specific-image
script into the jupyter-pavics-base image so the script and the config yaml (that is in the child image) is part of the same final image.Why? So eventually Jenkins can also checkout all the notebooks and run tests on them, like currently with the default Jupyter image. The e2e repo should not be responsible for checking out the notebooks like currently and will enable that same e2e repo to test against different notebooks depending on the Jupyter image. That requires implementation of Ouranosinc/PAVICS-e2e-workflow-tests#57 but we can start to lay the ground in that direction.
For the scheduler cronjob, this means you don't even have to volume-mount the script from outside since it's already inside the image !
As for re-using the existing generic
deploy-data
script, I think it simply boils down toyq
asdocker run
(currently) or as installed (your case), will have to add a new switch for that.git pull
(current), will have to add a new switch for that as well.Note that going the tar/zip archive route nullify the caching provided by
git pull
. However,git pull
way waste more space because basically there is a double checkout, one used for caching, one in the/data/jupyter_user_data/tutorial-notebooks/...
. Each side have pros and cons so for generic and re-usabilty sake, we can implement both.In a different PR, once
deploy-data
have the additional mode of operations, the jupyter-pavics-base will wget the script part of the build and make it available inside the image asdeploy-data-specific-image
is, without being directly committed asdeploy-data-specific-image
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just create a script that loop over DOCKER_NOTEBOOK_IMAGES env var (https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/env.local.example#L214) and perform this operation on each of them?
So, if I want to offer the eo image I add it the the available images env and voilà! I also get all its notebooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean automatically creating a job for each image found in DOCKER_NOTEBOOK_IMAGES?
I suppose we could, most of the parameters will stay the same with only the name (such as eo, nlp, etc.) changing.
But, do we always want to create a job for all of those images? For example, I don't want to have notebooks specific to pavics/workflow-tests, I don't want to have a job for this image. But I guess we could organize those images differently, like having another variable that contains the list of the images for which we want to actually run the script.
Or did you mean a single job that will run the script for each images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean a single job running a script updating all images. Basically what you already have, but inside a loop. Apart the image name everything else is the same.
And yes for all images found in DOCKER_NOTEBOOK_IMAGES. Right now we have workflow-tests, but the name is misleading because it's the primary image that is used by almost everyone! In fact all images in DOCKER_NOTEBOOK_IMAGES are available to user, so yes I think that we want to keep them updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following my previous comment regarding this config, given that this file
/notebook_config.yml.template
is in the image, why is it a .template?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "cronjob" is again pretty much similar to the existing https://github.com/bird-house/birdhouse-deploy/blob/57a320c3583f804838e8c18aeb064266527a0bc9/birdhouse/components/scheduler/deploy_data_job.env. Same question, if a feature is missing, can we add to it than forking it?
This one you might not be aware since it's under the "scheduler" component because it has to be used with that component.
Using that "generic" cronjob allows a much simplified cronjob definition like this https://github.com/bird-house/birdhouse-deploy/blob/57a320c3583f804838e8c18aeb064266527a0bc9/birdhouse/components/scheduler/deploy_raven_testdata_to_thredds.env
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I used the template word because in this version of the config, we find the variable ${JUPYTERHUB_USER_DATA_DIR}, which gets replaced by its value when running the script.
We do a copy of the config file using
envsubst < $TEMPLATE_CONFIG_YML > $CONFIG_YML
, where the copy has the real value instead of the variable.I agree 'template' is not the best word here though. I don't know if you have a better idea.
In regards to your comment about the dest_dir variable found in the yaml config, I could remove it, and we could always output the notebooks to the same directory, in ${JUPYTERHUB_USER_DATA_DIR}/tutorial-notebooks/{$IMAGE_NAME}.
We could remove that option of customization, which could remove my need of having a copy of the config with variables to replace.
If we decide to remove the dest_dir option, I will just remove all mentions of .template, and remove the envsubst command from the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it solves two issues and make the whole thing simpler, so I would remove the dest_dir option from the config.
But I would not use the directory ${JUPYTERHUB_USER_DATA_DIR}/tutorial-notebooks/{$IMAGE_NAME}. This is where the notebooks need to be on the host, but in the image it could be anywhere like /tmp as long as the calling command mount the host volume at that point. My suggestion still hold : in the docker command include the volume mount and set an environment variable telling the script where to put the stuff (the dest_dir, but as a env var rather than a config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, now that I have more time to digest this PR ...
I would suggest moving this sample big blob of code to generate the cronjob also into jupyter-pavics-base repo so it is together with the script
deploy-data-specific-image
it wraps.env.local.example
shorter and less intimidatingdeploy-data
script, the matching generic cronjobcomponents/scheduler/deploy_data_job.env
will also have to be adapted for whatever newer options thatdeploy-data
will support. And we still keep the same consistency that the "script" and the "cronjob wrapper" are together.Try to inspire from the existing generic cronjob wrapper, especially the part about
birdhouse-deploy/birdhouse/components/scheduler/deploy_data_job.env
Lines 64 to 67 in 57a320c
./pavics-compose.sh
invoked manually on the console and invoke inside the scheduler component).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to say, once you move the "cronjob generation code" out to jupyter-pavics-base, you can reference it back in
env.local.example
to "advertise" it, likebirdhouse-deploy/birdhouse/env.local.example
Lines 171 to 181 in 57a320c