Qwen2 VL cannot be convert to checkpoint on TensorRT-LLM #2658

xunuohope1107 · 2025-01-05T11:18:12Z

System Info

CPU: x86
GPU: 2xL40S
Memory: 256GB
System: Ubuntu 22.04
Docker Image: nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
TensorRT-LLM version: 0.16.0

Who can help?

I have tested the examples under examples/multimodal. But when I try to convert the Qwen2-VL-7B to checkpoint via python3 ../qwen/convert_checkpoint.py --model_dir Qwen2-VL-7B-Instruct \ --output_dir trt_models/Qwen2-VL-7B-Instruct/fp16/1-gpu \ --dtype float16, I got the error Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}, seems the Qwen2-VL is not supported. Is it due to the docker image I used or I have build the trtllm from the source?

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Cd to examples/multimodal
Run python3 ../qwen/convert_checkpoint.py --model_dir Qwen2-VL-7B-Instruct \ --output_dir trt_models/Qwen2-VL-7B-Instruct/fp16/1-gpu \ --dtype float16

Expected behavior

Got trt_models/Qwen2-VL-7B-Instruct/fp16/1-gpu without any errors.

actual behavior

Got error log:
root@04292e29d243:/workspace/TensorRT-LLM/examples/multimodal# python3 ../qwen/convert_checkpoint.py --model_dir Qwen2-VL-7B-Instruct \ --output_dir trt_models/Qwen2-VL-7B-Instruct/fp16/1-gpu \ --dtype float16 2025-01-03 11:20:24.426668: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2025-01-03 11:20:24.441389: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1735903224.456763 2272 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1735903224.461320 2272 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2025-01-03 11:20:24.477010: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. [TensorRT-LLM] TensorRT-LLM version: 0.16.0 0.16.0 Unrecognized keys in rope_scalingfor 'rope_type'='default': {'mrope_section'} Unrecognized keys inrope_scalingfor 'rope_type'='default': {'mrope_section'} Traceback (most recent call last): File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/functional.py", line 656, in from_string return RotaryScalingType[s] ~~~~~~~~~~~~~~~~~^^^ File "/usr/lib/python3.12/enum.py", line 814, in __getitem__ return cls._member_map_[name] ~~~~~~~~~~~~~~~~^^^^^^ KeyError: 'default'

additional notes

I have tried Phi-3 vision, Qwen2-7B-instruct as well, both of them works.

The text was updated successfully, but these errors were encountered:

nv-guomingz · 2025-01-06T03:03:23Z

@sunnyqgg would u please take a look this issue?

sunnyqgg · 2025-01-06T03:30:41Z

Hi,
Please use the latest main code and run "pip install -r requirements-qwen2vl.txt" firstly.

Thanks.

xunuohope1107 · 2025-01-10T08:31:38Z

I tried to rebuild the docker image with latest source code on the main branch. The checkpoint converting has been fixed for Qwen2-VL. However, the run.py seems still not working for Qwen2-VL.

I have tried python run.py \ --hf_model_dir Qwen2-VL-7B-Instruct \ --visual_engine_dir trt_engines/Qwen2-VL-7B-Instruct/vision_encoder \ --llm_engine_dir trt_engines/Qwen2-VL-7B-Instruct/fp16/1-gpu/ \ --image_path=merlion.png

But got root@00d9a1ccd86f:/workspace/TensorRT-LLM/examples/multimodal# python run.py \ --hf_model_dir Qwen2-VL-7B-Instruct \ --visual_engine_dir trt_engines/Qwen2-VL-7B-Instruct/vision_encoder \ --llm_engine_dir trt_engines/Qwen2-VL-7B-Instruct/fp16/1-gpu/ \ --image_path=merlion.png 2025-01-10 08:19:36.099445: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2025-01-10 08:19:36.114432: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1736497176.130732 10771 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1736497176.135485 10771 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2025-01-10 08:19:36.152056: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. [TensorRT-LLM] TensorRT-LLM version: 0.17.0.dev2024121700 [TensorRT-LLM][INFO] Engine version 0.17.0.dev2024121700 found in the config file, assuming engine(s) built by new builder API. [01/10/2025-08:19:39] [TRT-LLM] [I] Loading engine from trt_engines/Qwen2-VL-7B-Instruct/vision_encoder/model.engine [01/10/2025-08:19:39] [TRT-LLM] [I] Creating session from engine trt_engines/Qwen2-VL-7B-Instruct/vision_encoder/model.engine [01/10/2025-08:19:39] [TRT] [I] Loaded engine size: 1303 MiB [01/10/2025-08:19:40] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +498, now: CPU 0, GPU 1791 (MiB) [01/10/2025-08:19:40] [TRT-LLM] [I] Running LLM with C++ runner [TensorRT-LLM][INFO] Engine version 0.17.0.dev2024121700 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0 [TensorRT-LLM][INFO] Engine version 0.17.0.dev2024121700 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][INFO] Refreshed the MPI local session [TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0 [TensorRT-LLM][INFO] Rank 0 is using GPU 0 [TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 4 [TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 4 [TensorRT-LLM][INFO] TRTGptModel maxBeamWidth: 1 [TensorRT-LLM][INFO] TRTGptModel maxSequenceLen: 3072 [TensorRT-LLM][INFO] TRTGptModel maxDraftLen: 0 [TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: (3072) * 28 [TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0 [TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1 [TensorRT-LLM][INFO] TRTGptModel maxNumTokens: 8192 [TensorRT-LLM][INFO] TRTGptModel maxInputLen: 3071 = min(maxSequenceLen - 1, maxNumTokens) since context FMHA and usePackedInput are enabled [TensorRT-LLM][INFO] TRTGptModel If model type is encoder, maxInputLen would be reset in trtEncoderModel to maxInputLen: min(maxSequenceLen, maxNumTokens). [TensorRT-LLM][INFO] Capacity Scheduler Policy: GUARANTEED_NO_EVICT [TensorRT-LLM][INFO] Context Chunking Scheduler Policy: None [TensorRT-LLM][INFO] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value. [TensorRT-LLM][INFO] Loaded engine size: 14549 MiB [TensorRT-LLM][INFO] Inspecting the engine to identify potential runtime issues... [TensorRT-LLM][INFO] The profiling verbosity of the engine does not allow this analysis to proceed. Re-build the engine with 'detailed' profiling verbosity to get more diagnostics. [TensorRT-LLM][INFO] [MemUsageChange] Allocated 1000.03 MiB for execution context memory. [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 16332 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Allocated 11.49 MB GPU memory for runtime buffers. [TensorRT-LLM][INFO] [MemUsageChange] Allocated 9.72 MB GPU memory for decoder. [TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 44.52 GiB, available: 27.02 GiB [TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 7116 [TensorRT-LLM][INFO] Number of blocks in KV cache secondary pool: 0, onboard blocks to primary memory before reuse: true [TensorRT-LLM][INFO] Max KV cache pages per sequence: 48 [TensorRT-LLM][INFO] Number of tokens per block: 64. [TensorRT-LLM][INFO] [MemUsageChange] Allocated 24.32 GiB for max tokens in paged KV cache (455424). [TensorRT-LLM][INFO] Enable MPI KV cache transport. [01/10/2025-08:19:51] [TRT-LLM] [I] Load engine takes: 10.98725938796997 sec Traceback (most recent call last): File "/workspace/TensorRT-LLM/examples/multimodal/run.py", line 88, in <module> input_text, output_text = model.run(args.input_text, raw_image, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/runtime/multimodal_model_runner.py", line 1989, in run input_text, pre_prompt, post_prompt, processed_image, decoder_input_ids, other_vision_inputs, other_decoder_inputs = self.setup_inputs( ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/runtime/multimodal_model_runner.py", line 1728, in setup_inputs processor.apply_chat_template(msg, ^^^^^^^^^ NameError: name 'processor' is not defined. Did you mean: 'self.processor'? [TensorRT-LLM][INFO] Refreshed the MPI local session.

Any suggestion here? Thanks!

xunuohope1107 · 2025-01-10T08:35:23Z

I tried to rebuild the docker image with latest source code on the main branch. The checkpoint converting has been fixed for Qwen2-VL. However, the run.py seems still not working for Qwen2-VL.

I have tried python run.py \ --hf_model_dir Qwen2-VL-7B-Instruct \ --visual_engine_dir trt_engines/Qwen2-VL-7B-Instruct/vision_encoder \ --llm_engine_dir trt_engines/Qwen2-VL-7B-Instruct/fp16/1-gpu/ \ --image_path=merlion.png

But got root@00d9a1ccd86f:/workspace/TensorRT-LLM/examples/multimodal# python run.py \ --hf_model_dir Qwen2-VL-7B-Instruct \ --visual_engine_dir trt_engines/Qwen2-VL-7B-Instruct/vision_encoder \ --llm_engine_dir trt_engines/Qwen2-VL-7B-Instruct/fp16/1-gpu/ \ --image_path=merlion.png 2025-01-10 08:19:36.099445: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2025-01-10 08:19:36.114432: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1736497176.130732 10771 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1736497176.135485 10771 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2025-01-10 08:19:36.152056: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. [TensorRT-LLM] TensorRT-LLM version: 0.17.0.dev2024121700 [TensorRT-LLM][INFO] Engine version 0.17.0.dev2024121700 found in the config file, assuming engine(s) built by new builder API. [01/10/2025-08:19:39] [TRT-LLM] [I] Loading engine from trt_engines/Qwen2-VL-7B-Instruct/vision_encoder/model.engine [01/10/2025-08:19:39] [TRT-LLM] [I] Creating session from engine trt_engines/Qwen2-VL-7B-Instruct/vision_encoder/model.engine [01/10/2025-08:19:39] [TRT] [I] Loaded engine size: 1303 MiB [01/10/2025-08:19:40] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +498, now: CPU 0, GPU 1791 (MiB) [01/10/2025-08:19:40] [TRT-LLM] [I] Running LLM with C++ runner [TensorRT-LLM][INFO] Engine version 0.17.0.dev2024121700 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0 [TensorRT-LLM][INFO] Engine version 0.17.0.dev2024121700 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][INFO] Refreshed the MPI local session [TensorRT-LLM][INFO] MPI size: 1, MPI local size: 1, rank: 0 [TensorRT-LLM][INFO] Rank 0 is using GPU 0 [TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 4 [TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 4 [TensorRT-LLM][INFO] TRTGptModel maxBeamWidth: 1 [TensorRT-LLM][INFO] TRTGptModel maxSequenceLen: 3072 [TensorRT-LLM][INFO] TRTGptModel maxDraftLen: 0 [TensorRT-LLM][INFO] TRTGptModel mMaxAttentionWindowSize: (3072) * 28 [TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 0 [TensorRT-LLM][INFO] TRTGptModel normalizeLogProbs: 1 [TensorRT-LLM][INFO] TRTGptModel maxNumTokens: 8192 [TensorRT-LLM][INFO] TRTGptModel maxInputLen: 3071 = min(maxSequenceLen - 1, maxNumTokens) since context FMHA and usePackedInput are enabled [TensorRT-LLM][INFO] TRTGptModel If model type is encoder, maxInputLen would be reset in trtEncoderModel to maxInputLen: min(maxSequenceLen, maxNumTokens). [TensorRT-LLM][INFO] Capacity Scheduler Policy: GUARANTEED_NO_EVICT [TensorRT-LLM][INFO] Context Chunking Scheduler Policy: None [TensorRT-LLM][INFO] The logger passed into createInferRuntime differs from one already provided for an existing builder, runtime, or refitter. Uses of the global logger, returned by nvinfer1::getLogger(), will return the existing value. [TensorRT-LLM][INFO] Loaded engine size: 14549 MiB [TensorRT-LLM][INFO] Inspecting the engine to identify potential runtime issues... [TensorRT-LLM][INFO] The profiling verbosity of the engine does not allow this analysis to proceed. Re-build the engine with 'detailed' profiling verbosity to get more diagnostics. [TensorRT-LLM][INFO] [MemUsageChange] Allocated 1000.03 MiB for execution context memory. [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 16332 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] Allocated 11.49 MB GPU memory for runtime buffers. [TensorRT-LLM][INFO] [MemUsageChange] Allocated 9.72 MB GPU memory for decoder. [TensorRT-LLM][INFO] Memory usage when calculating max tokens in paged kv cache: total: 44.52 GiB, available: 27.02 GiB [TensorRT-LLM][INFO] Number of blocks in KV cache primary pool: 7116 [TensorRT-LLM][INFO] Number of blocks in KV cache secondary pool: 0, onboard blocks to primary memory before reuse: true [TensorRT-LLM][INFO] Max KV cache pages per sequence: 48 [TensorRT-LLM][INFO] Number of tokens per block: 64. [TensorRT-LLM][INFO] [MemUsageChange] Allocated 24.32 GiB for max tokens in paged KV cache (455424). [TensorRT-LLM][INFO] Enable MPI KV cache transport. [01/10/2025-08:19:51] [TRT-LLM] [I] Load engine takes: 10.98725938796997 sec Traceback (most recent call last): File "/workspace/TensorRT-LLM/examples/multimodal/run.py", line 88, in <module> input_text, output_text = model.run(args.input_text, raw_image, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/runtime/multimodal_model_runner.py", line 1989, in run input_text, pre_prompt, post_prompt, processed_image, decoder_input_ids, other_vision_inputs, other_decoder_inputs = self.setup_inputs( ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/runtime/multimodal_model_runner.py", line 1728, in setup_inputs processor.apply_chat_template(msg, ^^^^^^^^^ NameError: name 'processor' is not defined. Did you mean: 'self.processor'? [TensorRT-LLM][INFO] Refreshed the MPI local session.

Any suggestion here? Thanks!

The issue happens here:
[01/10/2025-08:19:51] [TRT-LLM] [I] Load engine takes: 10.98725938796997 sec Traceback (most recent call last): File "/workspace/TensorRT-LLM/examples/multimodal/run.py", line 88, in <module> input_text, output_text = model.run(args.input_text, raw_image, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/runtime/multimodal_model_runner.py", line 1989, in run input_text, pre_prompt, post_prompt, processed_image, decoder_input_ids, other_vision_inputs, other_decoder_inputs = self.setup_inputs( ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/runtime/multimodal_model_runner.py", line 1728, in setup_inputs processor.apply_chat_template(msg, ^^^^^^^^^ NameError: name 'processor' is not defined. Did you mean: 'self.processor'?

sunnyqgg · 2025-01-10T09:03:11Z

HI @xunuohope1107 ,
Please add processor = AutoProcessor.from_pretrained(self.args.hf_model_dir) in tensorrt_llm/runtime/multimodal_model_runner.py.

Thanks.

xunuohope1107 · 2025-01-10T09:59:42Z

Yeah, i have checked tensorrt_llm/runtime/multimodal_model_runner.py. But it already has if self.model_type == "qwen2_vl": hf_config = AutoConfig.from_pretrained(self.args.hf_model_dir) self.vision_start_token_id = hf_config.vision_start_token_id self.vision_end_token_id = hf_config.vision_end_token_id self.vision_token_id = hf_config.vision_token_id self.image_token_id = hf_config.image_token_id self.video_token_id = hf_config.video_token_id self.spatial_merge_size = hf_config.vision_config.spatial_merge_size self.max_position_embeddings = hf_config.max_position_embeddings self.hidden_size = hf_config.hidden_size self.num_attention_heads = hf_config.num_attention_heads self.rope_theta = hf_config.rope_theta

xunuohope1107 · 2025-01-13T05:42:28Z

Yeah, i have checked tensorrt_llm/runtime/multimodal_model_runner.py. But it already has if self.model_type == "qwen2_vl": hf_config = AutoConfig.from_pretrained(self.args.hf_model_dir) self.vision_start_token_id = hf_config.vision_start_token_id self.vision_end_token_id = hf_config.vision_end_token_id self.vision_token_id = hf_config.vision_token_id self.image_token_id = hf_config.image_token_id self.video_token_id = hf_config.video_token_id self.spatial_merge_size = hf_config.vision_config.spatial_merge_size self.max_position_embeddings = hf_config.max_position_embeddings self.hidden_size = hf_config.hidden_size self.num_attention_heads = hf_config.num_attention_heads self.rope_theta = hf_config.rope_theta

Do you mean modify the code like this:
`elif 'qwen2_vl' in self.model_type:
from qwen_vl_utils import process_vision_info
from transformers.models.qwen2_vl.modeling_qwen2_vl import
VisionRotaryEmbedding
hf_config = AutoConfig.from_pretrained(self.args.hf_model_dir)
if input_text is None:
input_text = "Question: Describe this image. Answer:"
messages = [[{
"role":
"user",
"content": [
{
"type": "image",
"image": raw_image[idx],
},
{
"type": "text",
"text": input_text[idx],
},
],
}] for idx in range(self.args.batch_size)]

        texts = [
            hf_config.apply_chat_template(msg,
                                          tokenize=False,
                                          add_generation_prompt=True)
            for msg in messages
        ]`

change processor.apply_chat_template to hf_config.apply_chat_template?

lessmore991 · 2025-01-17T12:30:06Z

Hi, Please use the latest main code and run "pip install -r requirements-qwen2vl.txt" firstly.

Thanks.

Do I still need to install the source code based on the code submitted by 21fac7? Can I use the latest version of transformers directly?

zhaocc1106 · 2025-01-17T14:56:03Z

Can try edit tensrrt-llm source:
vim /usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/qwen/config.py +146

xunuohope1107 added the bug Something isn't working label Jan 5, 2025

nv-guomingz added the LLM API/Workflow label Jan 6, 2025

nv-guomingz assigned sunnyqgg Jan 6, 2025

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2 VL cannot be convert to checkpoint on TensorRT-LLM #2658

Qwen2 VL cannot be convert to checkpoint on TensorRT-LLM #2658

xunuohope1107 commented Jan 5, 2025

nv-guomingz commented Jan 6, 2025

sunnyqgg commented Jan 6, 2025

xunuohope1107 commented Jan 10, 2025

xunuohope1107 commented Jan 10, 2025

sunnyqgg commented Jan 10, 2025

xunuohope1107 commented Jan 10, 2025

xunuohope1107 commented Jan 13, 2025

lessmore991 commented Jan 17, 2025

zhaocc1106 commented Jan 17, 2025

Qwen2 VL cannot be convert to checkpoint on TensorRT-LLM #2658

Qwen2 VL cannot be convert to checkpoint on TensorRT-LLM #2658

Comments

xunuohope1107 commented Jan 5, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

nv-guomingz commented Jan 6, 2025

sunnyqgg commented Jan 6, 2025

xunuohope1107 commented Jan 10, 2025

xunuohope1107 commented Jan 10, 2025

sunnyqgg commented Jan 10, 2025

xunuohope1107 commented Jan 10, 2025

xunuohope1107 commented Jan 13, 2025

lessmore991 commented Jan 17, 2025

zhaocc1106 commented Jan 17, 2025