Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first two LLM test guides #396

Merged
merged 9 commits into from
Sep 22, 2023
Merged

Add first two LLM test guides #396

merged 9 commits into from
Sep 22, 2023

Conversation

matthewkotila
Copy link
Contributor

Add guides for prefill and generation steps of LLM.

Copy link
Contributor

@debermudez debermudez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Only thing is that it is not clear that 2 different experiments are described here. Can you add 2 headers that describe the 1st token latency and T2T latency?

src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, otherwise looks good!

src/c++/perf_analyzer/docs/llm.md Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Show resolved Hide resolved
src/c++/perf_analyzer/docs/llm.md Outdated Show resolved Hide resolved
@matthewkotila matthewkotila force-pushed the matthewkotila-llm-guide branch from f84cc02 to 4281537 Compare September 19, 2023 20:36
Copy link
Contributor

@nv-hwoo nv-hwoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@matthewkotila
Copy link
Contributor Author

Comment to make sure triton-inference-server/tutorials#46 is taken into account for how this guide works.

@matthewkotila matthewkotila marked this pull request as draft September 21, 2023 02:58
@matthewkotila matthewkotila marked this pull request as ready for review September 22, 2023 19:32
@matthewkotila matthewkotila merged commit 930749c into main Sep 22, 2023
3 checks passed
@matthewkotila matthewkotila deleted the matthewkotila-llm-guide branch September 22, 2023 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants