int8 slower than bf16 on A100 #2553

ShuaiShao93 · 2024-12-09T23:01:03Z

System Info

x86_64, debian 11, A100 GPU

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

clone LLAMA3.1 8B model
test with int8

python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Meta-Llama-3.1-8B-Instruct --output_dir ./tllm_8b_checkpoint_1gpu_int8  --dtype bfloat16  --use_weight_only   --weight_only_precision int8

trtllm-build --checkpoint_dir ./tllm_8b_checkpoint_1gpu_int8 --output_dir ./tmp/llama/8B/trt_engines/int8/1-gpu  --gpt_attention_plugin auto  --gemm_plugin auto  --max_num_tokens 128000 --max_batch_size 8 --logits_dtype=float32 --gather_generation_logits --kv_cache_type=paged

python3 TensorRT-LLM/examples/run.py --engine_dir=./tmp/llama/8B/trt_engines/int8/1-gpu --max_output_len 1 --max_input_length=1000000 --run_profiling --tokenizer_dir ./Meta-Llama-3.1-8B-Instruct --input_file 15k-tokens.txt

test with bf16

python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Meta-Llama-3.1-8B-Instruct --output_dir ./tllm_8b_checkpoint_1gpu_bf16 --dtype bfloat16

trtllm-build --checkpoint_dir ./tllm_8b_checkpoint_1gpu_bf16 --output_dir ./tmp/llama/8B/trt_engines/bf16/1-gpu  --gpt_attention_plugin auto  --gemm_plugin auto  --max_num_tokens 128000 --max_batch_size 8 --logits_dtype=float32 --gather_generation_logits --kv_cache_type=paged

python3 TensorRT-LLM/examples/run.py --engine_dir=./tmp/llama/8B/trt_engines/bf16/1-gpu --max_output_len 1 --max_input_length=1000000 --run_profiling --tokenizer_dir ./Meta-Llama-3.1-8B-Instruct --input_file 15k-tokens.txt

Expected behavior

int8 should be much faster than bf16

actual behavior

int8 takes 1.7s, bf16 takes 1.2s

additional notes

A100 has int8 tensor cores

The text was updated successfully, but these errors were encountered:

ShuaiShao93 added the bug Something isn't working label Dec 9, 2024

nv-guomingz added the Low Precision Issue about lower bit quantization, including int8, int4, fp8 label Dec 10, 2024

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Dec 10, 2024

nv-guomingz added Low Precision Issue about lower bit quantization, including int8, int4, fp8 and removed triaged Issue has been triaged by maintainers Low Precision Issue about lower bit quantization, including int8, int4, fp8 Investigating labels Dec 10, 2024

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Dec 10, 2024

github-actions bot assigned nv-guomingz Dec 10, 2024

nv-guomingz removed triaged Issue has been triaged by maintainers Low Precision Issue about lower bit quantization, including int8, int4, fp8 Investigating labels Dec 10, 2024

nv-guomingz removed their assignment Dec 10, 2024

nv-guomingz added the Low Precision Issue about lower bit quantization, including int8, int4, fp8 label Dec 10, 2024

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Dec 10, 2024

nv-guomingz added Low Precision Issue about lower bit quantization, including int8, int4, fp8 and removed triaged Issue has been triaged by maintainers Low Precision Issue about lower bit quantization, including int8, int4, fp8 Investigating labels Dec 17, 2024

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Dec 17, 2024

github-actions bot assigned nv-guomingz Dec 17, 2024

nv-guomingz removed triaged Issue has been triaged by maintainers Low Precision Issue about lower bit quantization, including int8, int4, fp8 Investigating labels Dec 17, 2024

nv-guomingz removed their assignment Dec 17, 2024

nv-guomingz added the Low Precision Issue about lower bit quantization, including int8, int4, fp8 label Dec 17, 2024

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int8 slower than bf16 on A100 #2553

int8 slower than bf16 on A100 #2553

ShuaiShao93 commented Dec 9, 2024

int8 slower than bf16 on A100 #2553

int8 slower than bf16 on A100 #2553

Comments

ShuaiShao93 commented Dec 9, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes