Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem in build gemma 2 9B #2324

Closed
2 of 4 tasks
Alireza3242 opened this issue Oct 12, 2024 · 2 comments
Closed
2 of 4 tasks

problem in build gemma 2 9B #2324

Alireza3242 opened this issue Oct 12, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@Alireza3242
Copy link

System Info

A100

Who can help?

@byshiue

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

docker container: nvidia-tritonserver:24.09-trtllm-python-py3
model: gemma 2 9B

trtllm-build --checkpoint_dir ./data/tllm_checkpoint --output_dir ./model_repository/tensorrt_llm/1 --gpt_attention_plugin float16 --gemm_plugin float16

but it get gelu_pytorch_tanh not supported.

Expected behavior

support activation gelu_pytorch_tanh.

actual behavior

[TensorRT-LLM] TensorRT-LLM version: 0.13.0
[10/12/2024-15:45:56] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set gpt_attention_plugin to float16.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set gemm_plugin to float16.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set nccl_plugin to auto.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set lookup_plugin to None.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set lora_plugin to None.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set moe_plugin to auto.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set context_fmha to True.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set remove_input_padding to True.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set reduce_fusion to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set enable_xqa to True.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set tokens_per_block to 64.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set multiple_profiles to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set paged_state to True.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set streamingllm to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set use_fused_mlp to True.
[10/12/2024-15:45:56] [TRT-LLM] [W] Implicitly setting GemmaConfig.rotary_base = 10000.0
[10/12/2024-15:45:56] [TRT-LLM] [W] Implicitly setting GemmaConfig.attn_bias = False
[10/12/2024-15:45:56] [TRT-LLM] [W] Implicitly setting GemmaConfig.mlp_bias = False
[10/12/2024-15:45:56] [TRT-LLM] [W] Implicitly setting GemmaConfig.inter_layernorms = True
[10/12/2024-15:45:56] [TRT-LLM] [I] Set dtype to float16.
[10/12/2024-15:45:56] [TRT-LLM] [I] Set paged_kv_cache to True.
[10/12/2024-15:45:56] [TRT-LLM] [W] Overriding paged_state to False
[10/12/2024-15:45:56] [TRT-LLM] [I] Set paged_state to False.
[10/12/2024-15:45:56] [TRT-LLM] [I] max_seq_len is not specified, using deduced value 8192
[10/12/2024-15:45:56] [TRT-LLM] [W] remove_input_padding is enabled, while opt_num_tokens is not set, setting to max_batch_size*max_beam_width. 

[10/12/2024-15:45:56] [TRT-LLM] [W] padding removal and fMHA are both enabled, max_input_len is not required and will be ignored
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.0.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.1.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.2.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.3.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.4.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.5.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.6.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.7.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.8.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.9.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.10.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.11.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.12.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.13.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.14.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.15.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.16.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.17.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.18.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.19.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.20.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.21.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.22.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.23.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.24.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.25.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.26.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.27.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.28.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.29.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.30.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.31.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.32.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.33.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.34.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.35.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.36.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.37.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.38.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.39.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.40.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT-LLM] [W] fuse_gate_mlp cannot be done for transformer.layers.41.mlp due to unsupported activation gelu_pytorch_tanh. Skipping.
[10/12/2024-15:45:56] [TRT] [I] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 167, GPU 436 (MiB)
[10/12/2024-15:45:59] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1995, GPU +366, now: CPU 2318, GPU 802 (MiB)
[10/12/2024-15:45:59] [TRT-LLM] [I] Set nccl_plugin to None.
[10/12/2024-15:46:00] [TRT-LLM] [I] Total time of constructing network from module object 3.576707363128662 seconds
[10/12/2024-15:46:00] [TRT-LLM] [I] Total optimization profiles added: 1
[10/12/2024-15:46:00] [TRT-LLM] [I] Total time to initialize the weights in network Unnamed Network 0: 00:00:00
[10/12/2024-15:46:00] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
[10/12/2024-15:46:00] [TRT] [W] Unused Input: position_ids
[10/12/2024-15:46:00] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed.
[10/12/2024-15:46:00] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[10/12/2024-15:46:00] [TRT] [I] Compiler backend is used during engine build.
[10/12/2024-15:46:05] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
[10/12/2024-15:46:05] [TRT] [I] Detected 15 inputs and 1 output network tensors.
[10/12/2024-15:46:15] [TRT] [I] Total Host Persistent Memory: 129376 bytes
[10/12/2024-15:46:15] [TRT] [I] Total Device Persistent Memory: 0 bytes
[10/12/2024-15:46:15] [TRT] [I] Max Scratch Memory: 117473280 bytes
[10/12/2024-15:46:15] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 731 steps to complete.
[10/12/2024-15:46:15] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 43.3671ms to assign 17 blocks to 731 nodes requiring 822089728 bytes.
[10/12/2024-15:46:15] [TRT] [I] Total Activation Memory: 822088704 bytes
[10/12/2024-15:46:16] [TRT] [I] Total Weights Memory: 20326809732 bytes
[10/12/2024-15:46:16] [TRT] [I] Compiler backend is used during engine execution.
[10/12/2024-15:46:16] [TRT] [I] Engine generation completed in 15.4544 seconds.
[10/12/2024-15:46:16] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 8 MiB, GPU 19386 MiB
[10/12/2024-15:46:23] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 40291 MiB
[10/12/2024-15:46:23] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:00:22
[10/12/2024-15:46:23] [TRT] [I] Serialized 27 bytes of code generator cache.
[10/12/2024-15:46:23] [TRT] [I] Serialized 257296 bytes of compilation cache.
[10/12/2024-15:46:23] [TRT] [I] Serialized 10 timing cache entries
[10/12/2024-15:46:23] [TRT-LLM] [I] Timing cache serialized to model.cache
[10/12/2024-15:46:23] [TRT-LLM] [I] Build phase peak memory: 40294.03 MB, children: 16.96 MB
[10/12/2024-15:46:23] [TRT-LLM] [I] Serializing engine to /app/src/../model_repository/tensorrt_llm/1/rank0.engine...
[10/12/2024-15:46:42] [TRT-LLM] [I] Engine serialized. Total time: 00:00:19
[10/12/2024-15:46:43] [TRT-LLM] [I] Total time of building all engines: 00:00:47

additional notes

@Alireza3242 Alireza3242 added the bug Something isn't working label Oct 12, 2024
@holyjackaiforia
Copy link

Hello. I have the same issue any tips?

@Alireza3242
Copy link
Author

@holyjackaiforia
This is a warning and it is not important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants