-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to run symbolic shape inference when doing LLM Optimization with DirectML #1093
Comments
Hi @jojo1899, This sample requires a future version of onnxruntime-directml (tentatively named 1.17.4 as you've seen in the requirements) to run. This new version should be out very soon and, at the very least, you should be able to use a nightly build soon to run this sample. |
@PatriceVignola Thanks for the information. |
@jojo1899 Yes, this is expected. You can keep an eye on the 2 following PRs which are required to run this sample: microsoft/onnxruntime#20308 Once they are merged in (which will 100% be today), it will take one or 2 days to make it into a nightly build. I expect the next nightly build to have the changes. I will update the requirements once that build has been generated. |
Hi @jojo1899, we just updated the LLM sample to add the correct version of onnxruntime DirectML to use. You can simpley run
Note that when converting Mistral, you will still see the |
I tried running the code again. I get the following error when quantizing the model using AWQ.
The following are some details. I haven't tried running the model without quantizing it, but I will do that in a while and give an update. I have a question about the following warning in the log: Here is the log
|
I'm not sure what this warning is about (it comes from INC), but you definitely don't need an NPU for the quantization. I think it's likely that your device is running out of memory here, since 16gb of VRAM is barely enough to run the fp16 model normally, and quantization is more demanding. We have only confirmed that the quantization is working with RTX 4090 cards. We are looking at different quantization options since a lot of the AWQ quantization options out there are hard to use on consumer hardware and generally require powerful server machines or powerful GPUs to complete in a timely manner. If all you're interested in is converting to 4 bit to test the performance of the model, you can play around with the script and change the quantization strategy here to Olive/examples/directml/llm/llm.py Line 150 in 4e23c4c
It's not something that we have tested though since RTN is generally bad for LLMs. |
I was able to quantize the Mistral-7B on the same hardware using examples/mistral/mistral_int4_optimize.json. But I could not run inference on the quantized model using DML EP (see this issue for more details). I will try using that quantized model with examples/directml/llm/run_llm_io_binding.py for inference. Regarding the code in LLM Optimization with DirectML, although I could not quantize using AWQ, I could convert Mistral successfully using the following. Here is the successful log from Mistral conversion to onnx format
Here are logs from eight inference attempts of which only two attempts worked.
Any tips on what is happening here? |
UPDATE: I started using
Finally, I quantized and performed inference using The quantization was a bit too fast (took me 1-2 min). However, the quality of the quantized model is really good and I saw no weird responses from the LLM. The size of this INT4 quantized model on disk is 3.97 GB. |
When I try to get ort-nightly this error happenend Looking in indexes: https://pypi.org/simple, https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ How did you go through this? |
@purejomo Back then, I could access those nightly packages. It asks me for authentication now. They probably removed access to the nightly packages for the public. |
thank you for answering |
Describe the bug
I am trying to run the code in LLM Optimization with DirectML. The
requirements.txt
file saysonnxruntime-directml>=1.17.4
. Is there a typo in that? The latest version seems to beonnxruntime-directml 1.17.3
. Executingpip install -r requirements.txt
results in the following error.I continued running the code with
onnxruntime-directml 1.17.3
. However, the LLM Optimization with DirectML does not run as expected when the following is executed:python llm.py --model_type=mistral-7b-chat
.It
Failed to run symbolic shape inference
. It thenFailed to run Olive on gpu-dml
. The Traceback is pasted in the Olive logs below.To Reproduce
python llm.py --model_type=mistral-7b-chat
Expected behavior
Expected the code to run without any errors
Olive config
Add Olive configurations here.
Olive logs
Other information
olive-ai 0.6.0
onnxruntime-gpu 1.17.1
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: