Cannot load built Llama engine due to KeyError with config #2555

JohnnyRacer · 2024-12-10T02:58:48Z

Hello, I am trying to run the GPT LLM example from the backend repo. I built the engine with the following command utilizing the Docker image from NGC 24.11-trtllm-python-py3 to build the engine. The version of TensorRT-LLM is 0.15 from 24.11-trtllm-python-py3.

python3 convert_checkpoint.py --model_dir /models/llama2-7b  \
                             --output_dir /models/trtllm-llama2-7b  \
                             --dtype float16  \
                             --int8_kv_cache \
                             --use_weight_only \
                             --weight_only_precision int8

The error message snippet:

| +----------------+---------+--------------------------------------------------------------------------------------------------------------+
| | Model          | Version | Status                                                                                                       |
| +----------------+---------+--------------------------------------------------------------------------------------------------------------+
| | postprocessing | 1       | READY                                                                                                        |
| | preprocessing  | 1       | READY                                                                                                        |
| | tensorrt_llm   | 1       | UNAVAILABLE: Internal: KeyError: 'builder_config'                                                            |
| |                |         |                                                                                                              |
| |                |         | At:                                                                                                          |
| |                |         |   /usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner.py(84): _builder_to_model_config |
| |                |         |   /usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner.py(80): read_config              |
| |                |         |   /usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/model_runner.py(646): from_dir                |
| |                |         |   /code/gpt/tensorrt_llm/1/model.py(67): initialize                                                          |
| +----------------+---------+--------------------------------------------------------------------------------------------------------------+

The engine's config.json

{
    "mlp_bias": false,
    "attn_bias": false,
    "rotary_base": 10000.0,
    "rotary_scaling": null,
    "residual_mlp": false,
    "disable_weight_only_quant_plugin": false,
    "moe": {
        "num_experts": 0,
        "moe_intermediate_size": 0,
        "num_shared_experts": 0,
        "top_k": 0,
        "normalization_mode": null,
        "sparse_mixer_epsilon": 0.01,
        "tp_mode": 0,
        "device_limited_n_group": 0,
        "device_limited_topk_group": 0,
        "device_limited_routed_scaling_factor": 1.0
    },
    "remove_duplicated_kv_heads": false,
    "fc_after_embed": false,
    "use_input_layernorm_in_first_layer": true,
    "use_last_layernorm": true,
    "layer_idx_offset": 0,
    "architecture": "MistralForCausalLM",
    "dtype": "float16",
    "vocab_size": 32000,
    "hidden_size": 4096,
    "num_hidden_layers": 32,
    "num_attention_heads": 32,
    "hidden_act": "silu",
    "logits_dtype": "float32",
    "norm_epsilon": 1e-05,
    "runtime_defaults": null,
    "position_embedding_type": "rope_gpt_neox",
    "max_position_embeddings": 32768,
    "num_key_value_heads": 8,
    "intermediate_size": 14336,
    "mapping": {
        "world_size": 1,
        "gpus_per_node": 8,
        "cp_size": 1,
        "tp_size": 1,
        "pp_size": 1,
        "moe_tp_size": 1,
        "moe_ep_size": 1
    },
    "quantization": {
        "quant_algo": "W8A16",
        "kv_cache_quant_algo": "INT8",
        "group_size": 128,
        "smoothquant_val": 0.5,
        "clamp_val": null,
        "use_meta_recipe": false,
        "has_zero_point": false,
        "pre_quant_scale": false,
        "exclude_modules": null
    },
    "use_parallel_embedding": false,
    "embedding_sharding_dim": 0,
    "share_embedding_table": false,
    "head_size": 128,
    "qk_layernorm": false,
    "tie_word_embeddings": false,
    "quant_ckpt_path": null,
    "load_model_on_cpu": false
}

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot load built Llama engine due to KeyError with config #2555

Cannot load built Llama engine due to KeyError with config #2555

JohnnyRacer commented Dec 10, 2024 •

edited

Loading

Cannot load built Llama engine due to KeyError with config #2555

Cannot load built Llama engine due to KeyError with config #2555

Comments

JohnnyRacer commented Dec 10, 2024 • edited Loading

JohnnyRacer commented Dec 10, 2024 •

edited

Loading