NVIDIA / TensorRT-LLM Public

Notifications You must be signed in to change notification settings
Fork 1k
Star 8.9k

Code
Issues 288
Pull requests 69
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: NVIDIA/TensorRT-LLM

TensorRT-LLM Requests

#632 opened Dec 11, 2023 by ncomly-nvidia

Open 15

[Issue Template]Short one-line summary of the issue #270

#783 opened Jan 1, 2024 by juney-nvidia

Open

Labels 33 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

288 Open 1,771 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Qwen2-VL-2B-Instruct convert error

#2574 opened Dec 13, 2024 by diandianliu

TRT-LLM fails on GH200 node bug

Something isn't working

#2571 opened Dec 12, 2024 by ttim

4 tasks

Code for libtensorrt_llm_batch_manager_static.a

#2569 opened Dec 12, 2024 by arturttm

tensorrtllm_backend Support for InternVL2

#2568 opened Dec 12, 2024 by ChenJian7578

Support for LLaMa3.3

#2567 opened Dec 11, 2024 by FernandoDorado

InternVL deploy

#2565 opened Dec 11, 2024 by ChenJian7578

What does "weights_scaling_factor_2" mean in safetensor results of awq_w4a8 Investigating Low Precision

Issue about lower bit quantization, including int8, int4, fp8

triaged

Issue has been triaged by maintainers

#2561 opened Dec 11, 2024 by gujiewen

nccl hung

#2560 opened Dec 11, 2024 by akhoroshev

Correctly setting the --max_encoder_input_len when using trtllm-build with mllama models

#2558 opened Dec 10, 2024 by here4dadata

batch-manager.md is removed since https://github.com/NVIDIA/TensorRT-LLM/pull/2532

#2556 opened Dec 10, 2024 by spacewander

Cannot load built Llama engine due to KeyError with config

#2555 opened Dec 10, 2024 by JohnnyRacer

Mean-Pooling Layer for Bert for packed/nested/remove_input_padding tensor t.region->getDataType() == DataType::kINT32 failed.

#2554 opened Dec 10, 2024 by michaelfeil

int8 slower than bf16 on A100 bug

Something isn't working

Investigating Low Precision

Issue about lower bit quantization, including int8, int4, fp8

triaged

Issue has been triaged by maintainers

#2553 opened Dec 9, 2024 by ShuaiShao93

4 tasks

[feature request] Can we add H200 in infer_cluster_key() method?

#2552 opened Dec 9, 2024 by dongluw

Qwen2_VL profiling: TRT model low performance

#2551 opened Dec 9, 2024 by nzarif

[feature request] lm_head quantization

#2550 opened Dec 9, 2024 by youki-sada

trtllm-serve does not support dynamic batching like tritonserver

#2549 opened Dec 7, 2024 by Alireza3242

Performance issue with long context bug

Something isn't working

#2548 opened Dec 6, 2024 by ShuaiShao93

4 tasks

LayerInfo doesn't support fp8 and int4_awq dtype?

#2547 opened Dec 6, 2024 by youki-sada

Qwen2-VL FP8/INT8 Quantization

#2546 opened Dec 6, 2024 by MrD005

trtllm-bench faild bug

Something isn't working

#2545 opened Dec 6, 2024 by dingjingzhen

2 of 4 tasks

Encoding error in stream response from Triton server bug

Something isn't working

#2544 opened Dec 6, 2024 by Wonder-donbury

3 of 4 tasks

lora doesn't work when kv_cache is disabled Investigating Lora/P-tuning triaged

Issue has been triaged by maintainers

#2543 opened Dec 5, 2024 by ShuaiShao93

4 tasks

pip install errors out with HTTP error 404

#2542 opened Dec 5, 2024 by HeyangQin

Decrease accuracy when running llava model

#2541 opened Dec 5, 2024 by nghoaithuong

Previous 1 2 3 4 5 … 11 12 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly