Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD training on vertex #3

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
3 changes: 0 additions & 3 deletions Archive/.gcloudignore

This file was deleted.

84 changes: 0 additions & 84 deletions Archive/README.md

This file was deleted.

18 changes: 0 additions & 18 deletions Archive/cloudbuild.yaml

This file was deleted.

18 changes: 0 additions & 18 deletions Archive/clouddeploy.yaml

This file was deleted.

24 changes: 0 additions & 24 deletions Archive/dcgm_loadtest.yml

This file was deleted.

35 changes: 0 additions & 35 deletions Archive/dcgm_loadtest_deployment.yaml

This file was deleted.

35 changes: 0 additions & 35 deletions Archive/deployment.yaml

This file was deleted.

7 changes: 0 additions & 7 deletions Archive/kustomization.yaml

This file was deleted.

11 changes: 0 additions & 11 deletions Archive/pvc.yaml

This file was deleted.

23 changes: 0 additions & 23 deletions Archive/skaffold.yaml

This file was deleted.

Binary file added PEFTonVertex/.DS_Store
Binary file not shown.
27 changes: 27 additions & 0 deletions PEFTonVertex/CustomTraining/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

RUN apt update
RUN apt install -y wget git python3 python3-venv python3-pip

RUN pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117

WORKDIR /root

RUN git clone https://github.com/huggingface/peft.git
RUN pip install /root/peft
RUN git clone https://huggingface.co/spaces/smangrul/peft-lora-sd-dreambooth
RUN pip install -r /root/peft-lora-sd-dreambooth/requirements.txt

ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/local/cuda/lib64
RUN ln -s /usr/local/cuda/lib64/libcudart.so.11.0 /usr/local/cuda/lib64/libcudart.so
RUN pip install -U bitsandbytes --prefer-binary

# Installs additional packages as you need.
RUN pip install -U google-cloud-aiplatform
RUN pip install -U google-cloud-storage

# Copies the trainer code to the docker image.
COPY train.py /root/train.py

# Sets up the entry point to invoke the trainer.
ENTRYPOINT ["python3", "-m", "train"]
8 changes: 8 additions & 0 deletions PEFTonVertex/CustomTraining/cloud_build_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
steps:
- name: 'gcr.io/cloud-builders/docker'
args: [ 'build', '-t', 'us-central1-docker.pkg.dev/argolis-lsj-test/sd-lsj/sd-peft:v1', '.' ]
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/argolis-lsj-test/sd-lsj/sd-peft:v1']
options:
machineType: 'N1_HIGHCPU_8'
diskSizeGb: '200'
14 changes: 14 additions & 0 deletions PEFTonVertex/CustomTraining/cloud_cli.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# cloud build image
gcloud builds submit --config cloud-build-config.yaml .

# create vertex ai customer training job
# args format:
# --model_name: Huggingface repo id, or "/gcs/bucket_name/model_folder". I only test the models downloaded from HF, with standard diffusers format. Safetensors has not been test.
# --input_storage: bucket_name/input_image_folder
# --output_storage: bucket_name/output_folder
# --prompt: a photo of XXX
gcloud ai custom-jobs create \
--region=us-central1 \
--display-name=sd-lora-training-peft-1 \
--config=vertex-ai-config.yaml \
--args="--model_name=runwayml/stable-diffusion-v1-5,--input_storage=/gcs/sd_lsj/input_dog,--output_storage=/gcs/sd_lsj/peft/dog_lora_output,--prompt=a photo of sks dog,--class_prompt=a photo of dog"
Loading