LayerInfo doesn't support fp8 and int4_awq dtype? #2547

youki-sada · 2024-12-06T13:23:24Z

I built LLaMA 3B w4a8_awq by blow command.

$ trtllm-build --checkpoint_dir ./trt_models/llama3.2-3b-hf_w4a8KVfp8 --output_dir /trt_engines/tllm_llama3.2_3b_w4a8KVfp8_prof --gemm_plugin float16 --max_batch_size 1 --max_input_len 2048 --profiling_verbosity detailed
$ du -sh /trt_engines/*
6.8G    /trt_engines/tllm_llama3.2_3b_fp16_prof
2.9G    /trt_engines/tllm_llama3.2_3b_w4a8KVfp8_prof

and then, export layer information.

linfo = runner.session.engine_inspector.get_engine_information(trt.LayerInformationFormat.JSON)
with open("layerinfo.json","w") as f: f.write(linfo)

However, there is no Int4 description for WeightOnlyGroupwiseQuantMatmul weights.

$ grep weights layerinfo.json|tail -n 10
  "weights": {"Type": "Half", "Count": 196608},
  "weights": {"Type": "Half", "Count": 3072},
  "weights": {"Type": "Float", "Count": 1},
  "weights": {"Type": "Half", "Count": 6291456},
  "weights": {"Type": "Half", "Count": 196608},
  "weights": {"Type": "Half", "Count": 8192},
  "weights": {"Type": "Float", "Count": 1},
  "weights": {"Type": "Half", "Count": 6291456},
  "weights": {"Type": "Half", "Count": 196608},
  "weights": {"Type": "Half", "Count": 394002432},

layerinfo.json

The text was updated successfully, but these errors were encountered:

youki-sada · 2024-12-09T08:57:09Z

I guess it because the quantized weight is packed into fp16. Is there any way to get quant_algo for WeightOnlyGroupwiseQuantMatmul?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LayerInfo doesn't support fp8 and int4_awq dtype? #2547

LayerInfo doesn't support fp8 and int4_awq dtype? #2547

youki-sada commented Dec 6, 2024

youki-sada commented Dec 9, 2024

LayerInfo doesn't support fp8 and int4_awq dtype? #2547

LayerInfo doesn't support fp8 and int4_awq dtype? #2547

Comments

youki-sada commented Dec 6, 2024

youki-sada commented Dec 9, 2024