Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda out of memory #24

Open
Vicvickyue opened this issue Apr 26, 2024 · 4 comments
Open

Cuda out of memory #24

Vicvickyue opened this issue Apr 26, 2024 · 4 comments

Comments

@Vicvickyue
Copy link

Hello! Thank you so much for your amazing work. I'm posting to ask about the cuda out of memory error that I encounter when I'm running the InstanceDiffusion inference demon. I'm using one RTX3050 to run the program and there's no other process using the gpu while I'm running.
WeChat Image_20240426173954

@frank-xwang
Copy link
Owner

Hi, you may want to use a smaller '--num_images'. Also, please confirm that the flash attention (we use it by default) is used to reduce the memory usage.

@raindrop313
Copy link

raindrop313 commented Jun 6, 2024

I encountered the same issue, and reducing the "--num_images" did not resolve the problem. Based on the error message indicating an "out of memory" error during the model weight loading phase, could you please provide an estimate of how much GPU memory is required to run this project?
1
2
@frank-xwang

@milky245
Copy link

milky245 commented Jun 8, 2024

Hello, I have met the same problem. I tried reduce the --num_image to 2 or 1, and have confirmed that flash_attn is able to run normally. I ran the demo on RTX4060 with 8GB memory, and I would like to know what GPU memory is needed for training and deployment. @frank-xwang Thanks and looking forward reply.
屏幕截图 2024-06-08 201751

@frank-xwang
Copy link
Owner

frank-xwang commented Jun 10, 2024

Apologies for the delayed response.

Thank you for your interest in InstanceDiffusion. I have made further optimizations to reduce the memory usage of the code. Please update to the latest version by pulling the new InstanceDiffusion code. To run this updated code, you will likely need a GPU with at least 13G of memory. I recently tested it locally on RTX 6000 GPUs, which have 24G of memory, and the inference consumed about 12.8G of memory. For training the model, we utilize A100 GPUs with 80G of memory.

The command I used for model inference:

CUDA_VISIBLE_DEVICES=6 python inference.py \
  --num_images 8 \
  --output OUTPUT/demo/ \
  --input_json demos/demo_cat_dog_robin.json \
  --ckpt pretrained/instancediffusion_sd15.pth \
  --test_config configs/test_box.yaml \
  --guidance_scale 7.5 \
  --alpha 0.75 \
  --seed 4 \
  --mis 0.3 \
  --cascade_strength 0.3

And the memory usage is attached as below:
image

Hope it helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants