Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT-LLM in-process benchmarking support #35

Merged
merged 6 commits into from
Aug 9, 2024

Conversation

matthewkotila
Copy link
Contributor

All commits in PR have previously been reviewed. This is the epic branch PR to merge it into main.

Draft until final testing has completed.

nv-hwoo and others added 6 commits August 9, 2024 10:42
…0) (#762)

* Add tensorrtllm_engine option to service-kind and update testing

* Add output format check for tensorrtllm_engine

Co-authored-by: Elias Bermudez <[email protected]>
* support array of data in profile exporter

* add some tests

* run formatting

* fix pre-commit

* remove duplicate argparser arguments

* Fix Triton C API mode missing infer requested output datatype bug

---------

Co-authored-by: Matthew Kotila <[email protected]>
* support parsing tensorrtllm engine profile response

* add test

* refactor the test

* update types and names

* fix pre-commit

* run PA with triton c api

* more clean up on the tests

* fix codeql

* address feedback
@matthewkotila matthewkotila requested a review from nv-hwoo August 9, 2024 23:15
@matthewkotila matthewkotila marked this pull request as ready for review August 9, 2024 23:29
@matthewkotila matthewkotila merged commit e1455e0 into main Aug 9, 2024
7 checks passed
@matthewkotila matthewkotila deleted the tensorrtllm-engine branch August 9, 2024 23:42
lkomali pushed a commit that referenced this pull request Aug 15, 2024
* Add tensorrtllm_engine option to service-kind and update testing (#700) (#762)

* Add tensorrtllm_engine option to service-kind and update testing

* Add output format check for tensorrtllm_engine

Co-authored-by: Elias Bermudez <[email protected]>

* Support input payload generation for tensorrtllm engine (#767)

* Add functionality for async requests and output retrieval with Triton C API (#25)

* Support 1-d array data in profile exporter (#28)

* support array of data in profile exporter

* add some tests

* run formatting

* fix pre-commit

* remove duplicate argparser arguments

* Fix Triton C API mode missing infer requested output datatype bug

---------

Co-authored-by: Matthew Kotila <[email protected]>

* Support profile data parsing for tensorrtllm engine service kind (#33)

* support parsing tensorrtllm engine profile response

* add test

* refactor the test

* update types and names

* fix pre-commit

* run PA with triton c api

* more clean up on the tests

* fix codeql

* address feedback

* Add functionality to continue benchmarking in Triton C API mode if server logging support is disabled (#34)

---------

Co-authored-by: Hyunjae Woo <[email protected]>
Co-authored-by: Elias Bermudez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants