We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I built LLaMA 3B w4a8_awq by blow command.
$ trtllm-build --checkpoint_dir ./trt_models/llama3.2-3b-hf_w4a8KVfp8 --output_dir /trt_engines/tllm_llama3.2_3b_w4a8KVfp8_prof --gemm_plugin float16 --max_batch_size 1 --max_input_len 2048 --profiling_verbosity detailed $ du -sh /trt_engines/* 6.8G /trt_engines/tllm_llama3.2_3b_fp16_prof 2.9G /trt_engines/tllm_llama3.2_3b_w4a8KVfp8_prof
and then, export layer information.
linfo = runner.session.engine_inspector.get_engine_information(trt.LayerInformationFormat.JSON) with open("layerinfo.json","w") as f: f.write(linfo)
However, there is no Int4 description for WeightOnlyGroupwiseQuantMatmul weights.
WeightOnlyGroupwiseQuantMatmul
$ grep weights layerinfo.json|tail -n 10 "weights": {"Type": "Half", "Count": 196608}, "weights": {"Type": "Half", "Count": 3072}, "weights": {"Type": "Float", "Count": 1}, "weights": {"Type": "Half", "Count": 6291456}, "weights": {"Type": "Half", "Count": 196608}, "weights": {"Type": "Half", "Count": 8192}, "weights": {"Type": "Float", "Count": 1}, "weights": {"Type": "Half", "Count": 6291456}, "weights": {"Type": "Half", "Count": 196608}, "weights": {"Type": "Half", "Count": 394002432},
layerinfo.json
The text was updated successfully, but these errors were encountered:
I guess it because the quantized weight is packed into fp16. Is there any way to get quant_algo for WeightOnlyGroupwiseQuantMatmul?
Sorry, something went wrong.
No branches or pull requests
I built LLaMA 3B w4a8_awq by blow command.
and then, export layer information.
However, there is no Int4 description for
WeightOnlyGroupwiseQuantMatmul
weights.layerinfo.json
The text was updated successfully, but these errors were encountered: