Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEST: use Github based GPU instance for CI #183

Closed
wants to merge 22 commits into from
Closed

TEST: use Github based GPU instance for CI #183

wants to merge 22 commits into from

Conversation

kp992
Copy link
Contributor

@kp992 kp992 commented May 22, 2024

Built on top of #181

Use Github Actions GPU instance on CI.

Copy link

netlify bot commented May 22, 2024

Deploy Preview for incomparable-parfait-2417f8 ready!

Name Link
🔨 Latest commit 4d75f11
🔍 Latest deploy log https://app.netlify.com/sites/incomparable-parfait-2417f8/deploys/665ca9bf5294f200088e2aa3
😎 Deploy Preview https://deploy-preview-183--incomparable-parfait-2417f8.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

github-actions bot commented May 22, 2024

@github-actions github-actions bot temporarily deployed to pull request May 22, 2024 17:50 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 22, 2024 18:00 Inactive
@kp992 kp992 requested a review from mmcky May 22, 2024 18:01
@kp992
Copy link
Contributor Author

kp992 commented May 22, 2024

@mmcky Everything looks good now? I can remove the commented cache code.

@mmcky
Copy link
Contributor

mmcky commented May 24, 2024

@kp992 I still don't have a good undertanding of why we can't just setup anaconda directly on the instance (rather than using a docker container). Any insights into why the kernel keeps dying in that context?

@kp992
Copy link
Contributor Author

kp992 commented May 24, 2024

Thanks @mmcky, I will try to setup it and check.

@github-actions github-actions bot temporarily deployed to pull request May 24, 2024 14:30 Inactive
@kp992
Copy link
Contributor Author

kp992 commented May 24, 2024

@mmcky I waited for about 18 mins but seems like all the machines are busy. This can be a tricky problem to solve if we would depend on GA actions job. I cancelled it for now as I didn't want it to keep it running because that I am experimenting and runtime may go high.

@github-actions github-actions bot temporarily deployed to pull request May 25, 2024 14:40 Inactive
@mmcky
Copy link
Contributor

mmcky commented May 27, 2024

@kp992 thanks for trying some things out however 2ff9298 still uses a docker container which is not required. If you take a look at most of our github actions workflows then you can see we don't use a docker container in most of them. That is what I am proposed as the docker download takes a long time.

@kp992
Copy link
Contributor Author

kp992 commented May 27, 2024

Thanks @mmcky, I'm also trying to find that approach. I was verifying that we somehow need to install cuda. One of the approach is through docker as seen. I will try with conda.

@github-actions github-actions bot temporarily deployed to pull request May 27, 2024 14:04 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 27, 2024 14:26 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 27, 2024 15:06 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 27, 2024 15:20 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 27, 2024 16:36 Inactive
@kp992
Copy link
Contributor Author

kp992 commented May 27, 2024

Hmm, its not working. The best way I think is to use the docker that you have made (existing main branch CI) and cache it. Since we are not updating the image frequently, that should be easier to manage. How does that sound?
Ref: https://github.com/marketplace/actions/build-docker-images-using-cache

@github-actions github-actions bot temporarily deployed to pull request May 29, 2024 13:40 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 29, 2024 13:55 Inactive
@github-actions github-actions bot temporarily deployed to pull request May 29, 2024 13:57 Inactive
@mmcky
Copy link
Contributor

mmcky commented May 30, 2024

Hmm, its not working. The best way I think is to use the docker that you have made (existing main branch CI) and cache it. Since we are not updating the image frequently, that should be easier to manage. How does that sound? Ref: https://github.com/marketplace/actions/build-docker-images-using-cache

@kp992 can we build the docker image on GitHub actions and then cache it from another repo?

I build the image https://github.com/quantecon/lecture-python.docker and then sync it to docker hub. But if we can save it as an asset in QuantEcon GitHub then we can pull it locally which should be quicker.

@kp992
Copy link
Contributor Author

kp992 commented Jun 1, 2024

I am trying to do the same, but, for a single repo. For example check the last two commits of this PR which have Github actions run: https://github.com/QuantEcon/lecture-jax/actions/runs/9258083875/job/25555094247 (first run) and https://github.com/QuantEcon/lecture-jax/actions/runs/9287258102/job/25555816581(second run). First run took about 13m and the second one about 11m so we are saving 2 mins. I think we can also optimize the Anaconda setup by including it in the docker itself which you have done in https://github.com/quantecon/lecture-python.docker. I will try it and see the results.

@github-actions github-actions bot temporarily deployed to pull request June 2, 2024 16:40 Inactive
@kp992
Copy link
Contributor Author

kp992 commented Jun 2, 2024

https://github.com/QuantEcon/lecture-jax/actions/runs/9339691234/job/25704378502 took about 34 mins to load our docker which is very high compared to pulling nvidia docker and setting up conda differently which took about 13 mins.

@github-actions github-actions bot temporarily deployed to pull request June 2, 2024 17:31 Inactive
@kp992
Copy link
Contributor Author

kp992 commented Jun 5, 2024

@mmcky How does this commit 9ee6f21 look? Should we go ahead with nvidia cuda docker (cache enables) and Anaconda setup separately?

@mmcky
Copy link
Contributor

mmcky commented Jun 6, 2024

https://github.com/QuantEcon/lecture-jax/actions/runs/9339691234/job/25704378502 took about 34 mins to load our docker which is very high compared to pulling nvidia docker and setting up conda differently which took about 13 mins.

@kp992 I suspect the nvidia docker is cached as a I think they are similar GB sizes. I was wantign to see if we can cache our docker container as a GitHub asset to improve on this. Do you know if this is possible -- we build that container on github and then send it to dockerhub but a local asset might be a better way to go if there is a local registry.

@kp992
Copy link
Contributor Author

kp992 commented Jun 8, 2024

Hmm, I am not aware of anything like that. I guess Github provides about 10 GB of caching where we can use our pulled docker container to be saved in .zip file and pull it whenever we run it next time. Since, docker won't be updated very frequently, it is maintainable too.

@mmcky
Copy link
Contributor

mmcky commented Jun 10, 2024

@kp992 rather than managing cache objects ourselves (as they usually expire after 1 month of inactivity) I was thinking of using the github registry for the docker container. Is this feasible?

https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry

@mmcky mmcky changed the title MAINT: use Github based GPU instance for CI TEST: use Github based GPU instance for CI Jun 10, 2024
@mmcky mmcky added the in-work label Jun 11, 2024
@mmcky
Copy link
Contributor

mmcky commented Jun 13, 2024

@kp992 I think this can now be closed in favour of #181

@mmcky mmcky closed this Jun 13, 2024
@mmcky mmcky deleted the gpu_trial_kpl branch June 13, 2024 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants