Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with kbatch #4

Open
1 task
SammyAgrawal opened this issue Aug 6, 2024 · 2 comments
Open
1 task

Experiment with kbatch #4

SammyAgrawal opened this issue Aug 6, 2024 · 2 comments

Comments

@SammyAgrawal
Copy link

SammyAgrawal commented Aug 6, 2024

kbatch notes

kbatch docs

What we tested

  • Simple example
  • Reading from public cloud storage and runnning a computation
    • Check back on my-job-read-write-osn-gfzbm-4vftc

What we are missing

  • Reading/Writing to 'our' buckets (I assume this would work out of the box if set up on the LEAP hub?)
  • Using different machine types
  • Can we chain jobs? E.g. run preprocessing.py and then train_model.py
@SammyAgrawal
Copy link
Author

SammyAgrawal commented Aug 8, 2024

  1. Verified the basic hello.py script Yuri sent
  2. Verified can run externally hosted quay images
    kbatch job submit --name=my-image-job --image=$MY_CUSTOM_IMAGE --command='["python", "my_image.py"]' --code="my_image.py" -o name
  3. In addition to writing to cloud buckets, how to write local files (e.g. user directory). This seems important for things like model checkpoints, logs, or output jupyter notebooks from papermill?

my_image.py:

import diffusers
import torch
import torch.nn as nn
print(diffusers)

Next step is running in conjunction with papermill (#5 )

edit: succeeded in running with papermill, issue now is how to save results. In general if script writes to file, unclear how that file can be retrieved. Is writing to cloud buckets the only way or can we somehow access user directories?

@SammyAgrawal
Copy link
Author

SammyAgrawal commented Aug 8, 2024

Unclear how to pass in multiple code files (script as well as jupyter notebook)

kbatch job submit --name=my-papermill-test --image=$MY_CUSTOM_IMAGE --command='["python", "papermill_test.py"]' --code="papermill_test.py" -o name
^^ base command that works but fails because papermill_test.py itself requires access to another file, notebook_test.ipynb.

Tried: --code="papermill_test.py" --code "notebook_test.ipynb"
Failed with: python: can't open file '/code/papermill_test.py': [Errno 2] No such file or directory
Failed because: calling --code twice simply overwrites the previous flag so is equal to just sending notebook_test

Tried: --code='["papermill_test.py", "notebook_test.ipynb"]'
Failed with: FileNotFoundError: [Errno 2] No such file or directory: '["papermill_test.py", "notebook_test.ipynb"]',
Faled because: does not even run the job because interprets the whole string as one filename

Tried: --code="papermill_test.py", "notebook_test.ipynb" and --code=["papermill_test.py", "notebook_test.ipynb"]
Failed with: Error: Got unexpected extra argument (notebook_test.ipynb])
Failed because: only takes in one argument, even though docs say can pass in list

edit: Solved! to pass in multiple code files, you must make a directory with everything you want to send and send that instead.
--code="test_file_dir"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant