Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the GPU memory usage sometimes spikes unexpectedly #37

Open
ak01user opened this issue Jan 15, 2025 · 9 comments
Open

the GPU memory usage sometimes spikes unexpectedly #37

ak01user opened this issue Jan 15, 2025 · 9 comments
Labels
bug Something isn't working

Comments

@ak01user
Copy link

ak01user commented Jan 15, 2025

image
well done, this project is truly amazing and very user-friendly. I tried using it on a GPU with the CUDAExecutionProvider, but during inference, the GPU memory usage sometimes spikes unexpectedly. Why is that? Gpu is 4090 with 24gb.

@satvikpendem
Copy link

@ak01user How are you running it on GPU? I have a 3080 and it still looks like it's using the CPU, anything specific I need to do or modify?

@ak01user
Copy link
Author

ak01user commented Jan 16, 2025

hi, you only need to modify the providers parameter, but please make sure you have installed onnxruntime-gpu @satvikpendem
`providers = ['CUDAExecutionProvider'] # specific CUDA mode

session = InferenceSession(
"./data/kokoro-v0_19.onnx", providers=providers
)

kokoro_instance = Kokoro.from_session(session, "./data/voices.json")`

@satvikpendem
Copy link

satvikpendem commented Jan 16, 2025

Thanks, got it. Having some issues running CUDA on Windows but I'll try on WSL and see. How much faster is the GPU for you versus the CPU time for generating the audio file?

@ak01user
Copy link
Author

I have encapsulated the API using FastAPI. I simulated 100 requests, each generating 8 seconds of audio. It took 136 seconds in CPU mode and 41 seconds on the GPU. My device has an i7-13700KF CPU and an RTX 4090 GPU.

@thewh1teagle thewh1teagle added the bug Something isn't working label Jan 17, 2025
@satvikpendem
Copy link

satvikpendem commented Jan 17, 2025

I see the issue, uv automatically overrides and installs the CPU version (onnxruntime) and uses that when you run uv run hello.py even if you install the GPU version, because only the CPU version is specified in the kokoro-onnx dependencies. @thewh1teagle is there a way to add a flag to add the GPU version instead?

In the meantime, I did:

python3 -m venv .venv
.venv/bin/pip install -U kokoro-onnx
.venv/bin/pip uninstall onnxruntime
.venv/bin/pip install onnxruntime-gpu
.venv/bin/python hello.py

Works great with CUDA, the TensorRT provider throws an error about how it needs to infer shapes but when I tried that it didn't seem to work.

@thewh1teagle
Copy link
Owner

I see the issue, uv automatically overrides and installs the CPU version (onnxruntime) and uses that when you run uv run hello.py even if you install the GPU version, because only the CPU version is specified in the kokoro-onnx dependencies.

Good finding! Thanks
You can use pip directly with uv for instance

uv pip uninstall onnxruntime
uv pip install onnxruntime-gpu

Then run it. let me know if it still overrides it.
As for long term solution in kokoro-onnx. I thought maybe add option to install kokoro-onnx with pip install kokoro-onnx[gpu]
Then it will use onnxruntime-gpu instead. I need to see how feature flags with dependencies works best with uv

@thewh1teagle
Copy link
Owner

@satvikpendem

I released new version with gpu support.
See examples/with_gpu.py

@satvikpendem
Copy link

satvikpendem commented Jan 18, 2025

Then run it. let me know if it still overrides it.

@thewh1teagle Yes uv does still override it because it detects that onnxruntime (non GPU) is in the dependencies list and automatically installs that before running. Only my method seems to work to override the dependency to the GPU version. For your example, are you doing uv pip install kokoro-onnx[gpu] and that works fine on GPU with CUDA?

@thewh1teagle
Copy link
Owner

Yes uv does still override it because it detects that onnxruntime (non GPU) is in the dependencies list and automatically installs that before running. Only my method seems to work to override the dependency to the GPU version. For your example, are you doing uv pip install kokoro-onnx[gpu] and that works fine on GPU with CUDA?

After the last update, if you install kokoro-onnx[gpu] it works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants