[BUG] Can not generate LLM structured inference output with exllamav2 version >0.20 #709
Closed
3 tasks done
Labels
bug
Something isn't working
OS
Windows
GPU Library
CUDA 12.x
Python version
3.11
Pytorch version
2.5.1
Model
llama3.1
Describe the bug
Getting this error
AttributeError: 'ExLlamaV2TokenEnforcerFilter' object has no attribute 'background_drop'
at the below code:
As a bonus question:
I am trying to run inference with phi4 model. I could not locate any example for the same. Can you check below three functions for their correctness?
Reproduction steps
N/A
Expected behavior
N/A
Logs
N/A
Additional context
N/A
Acknowledgements
The text was updated successfully, but these errors were encountered: