-
Notifications
You must be signed in to change notification settings - Fork 964
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix errors when using smoothquant to quantize Qwen2 model
quantization
Issue about lower bit quantization, including int8, int4, fp8
#2370
opened Oct 24, 2024 by
Missmiaom
Loading…
Fix errors when quantizing Llama model
quantization
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2264
opened Sep 28, 2024 by
dleunji
Loading…
README.md: Add 3rd Party Inference Speed Dashboard
documentation
Improvements or additions to documentation
triaged
Issue has been triaged by maintainers
#2244
opened Sep 22, 2024 by
matichon-vultureprime
Loading…
Modify small-batched weight only quantization
quantization
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2213
opened Sep 10, 2024 by
dasistwo
Loading…
[examples/bert/build.py]: Load weights for BertModel and RobertaModel if Issue has been triaged by maintainers
--model_dir
is provided
triaged
#2187
opened Sep 3, 2024 by
tkhanipov
Loading…
Add workaround instruction for a known issue of v0.11 on Windows
Merged
#2146
opened Aug 23, 2024 by
pamelap-nvidia
Loading…
fix wrong buffer for
oneShotAllReduceKernel
under PUSH_MODE
#2099
opened Aug 8, 2024 by
YconquestY
Loading…
decoder MMHA kernel support INT8 SCALE_Q_INSTEAD_OF_K and SCALE_P_INS…
#2085
opened Aug 5, 2024 by
lishicheng1996
Loading…
fix wrong arg in Engine Building Command in docs/source/performance/perf-overview.md
documentation
Improvements or additions to documentation
#2057
opened Jul 30, 2024 by
RuibaiXu
Loading…
Fix default min length
triaged
Issue has been triaged by maintainers
#1935
opened Jul 11, 2024 by
akhoroshev
Loading…
Bump transformers from 4.36.2 to 4.38.0 in /examples/multimodal
bug
Something isn't working
dependencies
Pull requests that update a dependency file
triaged
Issue has been triaged by maintainers
waiting for feedback
#1689
opened May 28, 2024 by
dependabot
bot
Loading…
add cached generation buffer
triaged
Issue has been triaged by maintainers
waiting for feedback
#1685
opened May 28, 2024 by
michael200892458
Loading…
Fix CUDA OOM when creating Mixtral checkpoint
triaged
Issue has been triaged by maintainers
waiting for feedback
#1629
opened May 19, 2024 by
VivekBits2210
Loading…
[feat]: Support weight only gemm with 2bit
triaged
Issue has been triaged by maintainers
waiting for feedback
#1568
opened May 9, 2024 by
gavinchen430
Loading…
Support SDXL and its distributed inference
waiting for feedback
#1514
opened Apr 28, 2024 by
Zars19
Loading…
fix: correct cudaSetDevice error when GPUs per node are fewer than their ranks in inter-node inference
#1495
opened Apr 24, 2024 by
littlefatfat
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2024-10-23.