-
Notifications
You must be signed in to change notification settings - Fork 1k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[LLM] sampling_params should be setup only if end_id is None and tokenizer is not None
#2573
opened Dec 12, 2024 by
mfuntowicz
Loading…
feat(qwen): add trust_remote_code argument support
triaged
Issue has been triaged by maintainers
#2493
opened Nov 24, 2024 by
ShivamSphn
Loading…
bugfix/incorrect lora out dims
triaged
Issue has been triaged by maintainers
#2484
opened Nov 22, 2024 by
akhoroshev
Loading…
Fix prompt_table_data empty tensor shape error
triaged
Issue has been triaged by maintainers
#2470
opened Nov 20, 2024 by
BasicCoder
Loading…
Create INT8 KV Cache on Qserve
triaged
Issue has been triaged by maintainers
#2446
opened Nov 14, 2024 by
dleunji
Loading…
th::optional -> std::optional
triaged
Issue has been triaged by maintainers
#2397
opened Oct 31, 2024 by
r-barnes
Loading…
attention mechanism toggle added
functionality issue
triaged
Issue has been triaged by maintainers
waiting for feedback
#2384
opened Oct 28, 2024 by
Aaryanverma
Loading…
fix load_model_on_cpu on qwen/convert_checkpoint.py
feature request
New feature or request
triaged
Issue has been triaged by maintainers
#2382
opened Oct 27, 2024 by
lkm2835
Loading…
Fix errors when using smoothquant to quantize Qwen2 model
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2370
opened Oct 24, 2024 by
Missmiaom
Loading…
README.md: Add 3rd Party Inference Speed Dashboard
Documentation
Improvements or additions to documentation
triaged
Issue has been triaged by maintainers
#2244
opened Sep 22, 2024 by
matichon-vultureprime
Loading…
Modify small-batched weight only quantization
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2213
opened Sep 10, 2024 by
dasistwo
Loading…
[examples/bert/build.py]: Load weights for BertModel and RobertaModel if Issue has been triaged by maintainers
--model_dir
is provided
triaged
#2187
opened Sep 3, 2024 by
tkhanipov
Loading…
fix wrong buffer for
oneShotAllReduceKernel
under PUSH_MODE
#2099
opened Aug 8, 2024 by
YconquestY
Loading…
decoder MMHA kernel support INT8 SCALE_Q_INSTEAD_OF_K and SCALE_P_INS…
#2085
opened Aug 5, 2024 by
lishicheng1996
Loading…
fix wrong arg in Engine Building Command in docs/source/performance/perf-overview.md
Documentation
Improvements or additions to documentation
#2057
opened Jul 30, 2024 by
RuibaiXu
Loading…
Fix default min length
triaged
Issue has been triaged by maintainers
#1935
opened Jul 11, 2024 by
akhoroshev
Loading…
Bump transformers from 4.36.2 to 4.38.0 in /examples/multimodal
bug
Something isn't working
dependencies
Pull requests that update a dependency file
triaged
Issue has been triaged by maintainers
waiting for feedback
#1689
opened May 28, 2024 by
dependabot
bot
Loading…
add cached generation buffer
triaged
Issue has been triaged by maintainers
waiting for feedback
#1685
opened May 28, 2024 by
michael200892458
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.