You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[TensorRT-LLM] TensorRT-LLM version: 0.13.0
[10/12/2024-15:45:56] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set gpt_attention_plugin to float16.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set gemm_plugin to float16.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set nccl_plugin to auto.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set lookup_plugin to None.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set lora_plugin to None.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set moe_plugin to auto.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set context_fmha to True.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set remove_input_padding to True.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set reduce_fusion to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set enable_xqa to True.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set tokens_per_block to 64.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set multiple_profiles to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set paged_state to True.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set streamingllm to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set use_fused_mlp to True.
[10/12/2024-15:45:56] [TRT-LLM] [W] Implicitly setting GemmaConfig.rotary_base = 10000.0
[10/12/2024-15:45:56] [TRT-LLM] [W] Implicitly setting GemmaConfig.attn_bias = False
[10/12/2024-15:45:56] [TRT-LLM] [W] Implicitly setting GemmaConfig.mlp_bias = False
[10/12/2024-15:45:56] [TRT-LLM] [W] Implicitly setting GemmaConfig.inter_layernorms = True
[10/12/2024-15:45:56] [TRT-LLM] [I] Set dtype to float16.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set paged_kv_cache to True.
[10/12/2024-15:45:56] [TRT-LLM] [W] Overriding paged_state to False
[10/12/2024-15:45:56] [TRT-LLM] [I] Set paged_state to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] max_seq_len is not specified, using deduced value 8192
[10/12/2024-15:45:56] [TRT-LLM] [W] remove_input_padding is enabled, while opt_num_tokens is not set, setting to max_batch_size*max_beam_width.
[10/12/2024-15:45:56] [TRT-LLM] [W] padding removal and fMHA are both enabled, max_input_len is not required and will be ignored
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.0.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.1.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.2.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.3.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.4.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.5.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.6.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.7.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.8.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.9.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.10.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.11.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.12.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.13.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.14.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.15.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.16.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.17.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.18.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.19.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.20.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.21.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.22.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.23.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.24.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.25.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.26.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.27.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.28.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.29.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.30.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.31.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.32.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.33.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.34.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.35.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.36.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.37.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.38.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.39.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.40.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.41.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT] [I] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 167, GPU 436 (MiB)
[10/12/2024-15:45:59] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1995, GPU +366, now: CPU 2318, GPU 802 (MiB)
[10/12/2024-15:45:59] [TRT-LLM] [I] Set nccl_plugin to None.
[10/12/2024-15:46:00] [TRT-LLM] [I] Total time of constructing network from module object 3.576707363128662 seconds
[10/12/2024-15:46:00] [TRT-LLM] [I] Total optimization profiles added: 1
[10/12/2024-15:46:00] [TRT-LLM] [I] Total time to initialize the weights in network Unnamed Network 0: 00:00:00
[10/12/2024-15:46:00] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
[10/12/2024-15:46:00] [TRT] [W] Unused Input: position_ids
[10/12/2024-15:46:00] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed.
[10/12/2024-15:46:00] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[10/12/2024-15:46:00] [TRT] [I] Compiler backend is used during engine build.
[10/12/2024-15:46:05] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
[10/12/2024-15:46:05] [TRT] [I] Detected 15 inputs and 1 output network tensors.
[10/12/2024-15:46:15] [TRT] [I] Total Host Persistent Memory: 129376 bytes
[10/12/2024-15:46:15] [TRT] [I] Total Device Persistent Memory: 0 bytes
[10/12/2024-15:46:15] [TRT] [I] Max Scratch Memory: 117473280 bytes
[10/12/2024-15:46:15] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 731 steps to complete.
[10/12/2024-15:46:15] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 43.3671ms to assign 17 blocks to 731 nodes requiring 822089728 bytes.
[10/12/2024-15:46:15] [TRT] [I] Total Activation Memory: 822088704 bytes
[10/12/2024-15:46:16] [TRT] [I] Total Weights Memory: 20326809732 bytes
[10/12/2024-15:46:16] [TRT] [I] Compiler backend is used during engine execution.
[10/12/2024-15:46:16] [TRT] [I] Engine generation completed in 15.4544 seconds.
[10/12/2024-15:46:16] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 8 MiB, GPU 19386 MiB
[10/12/2024-15:46:23] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 40291 MiB
[10/12/2024-15:46:23] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:00:22
[10/12/2024-15:46:23] [TRT] [I] Serialized 27 bytes of code generator cache.
[10/12/2024-15:46:23] [TRT] [I] Serialized 257296 bytes of compilation cache.
[10/12/2024-15:46:23] [TRT] [I] Serialized 10 timing cache entries
[10/12/2024-15:46:23] [TRT-LLM] [I] Timing cache serialized to model.cache
[10/12/2024-15:46:23] [TRT-LLM] [I] Build phase peak memory: 40294.03 MB, children: 16.96 MB
[10/12/2024-15:46:23] [TRT-LLM] [I] Serializing engine to /app/src/../model_repository/tensorrt_llm/1/rank0.engine...
[10/12/2024-15:46:42] [TRT-LLM] [I] Engine serialized. Total time: 00:00:19
[10/12/2024-15:46:43] [TRT-LLM] [I] Total time of building all engines: 00:00:47
additional notes
The text was updated successfully, but these errors were encountered:
System Info
A100
Who can help?
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
docker container: nvidia-tritonserver:24.09-trtllm-python-py3
model: gemma 2 9B
but it get gelu_pytorch_tanh not supported.
Expected behavior
support activation gelu_pytorch_tanh.
actual behavior
additional notes
The text was updated successfully, but these errors were encountered: