Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] LLaVA support #720

Merged
merged 19 commits into from
Jul 10, 2024
Merged

[WIP] LLaVA support #720

merged 19 commits into from
Jul 10, 2024

Conversation

mwawrzos
Copy link
Collaborator

The goal of this MR is to enable measuring VLM throughput and latency where input includes images.

@mwawrzos mwawrzos self-assigned this Jun 27, 2024
@mwawrzos mwawrzos force-pushed the mwawrzos/openai-vision branch from e1bfcb4 to 1326edb Compare June 27, 2024 07:52
@mwawrzos mwawrzos force-pushed the mwawrzos/openai-vision branch from 1326edb to 06df643 Compare June 27, 2024 08:30
@mwawrzos mwawrzos force-pushed the mwawrzos/openai-vision branch from a400418 to dfb6b1d Compare June 27, 2024 15:31
@@ -41,6 +87,7 @@ class PromptSource(Enum):
class OutputFormat(Enum):
OPENAI_CHAT_COMPLETIONS = auto()
OPENAI_COMPLETIONS = auto()
OPENAI_VISION = auto()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response format for chat VLMs is the same as the regular chat completion since we just have text out, why have a separate entry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the enum is a bit misleading 😅 The OutputFormat enum is actually not about the format of the response but it's about the format of the resulting input json file by LlmInputs.

@nv-hwoo nv-hwoo changed the base branch from main to vision-language July 3, 2024 18:28
@mwawrzos mwawrzos force-pushed the mwawrzos/openai-vision branch from 60b658a to 8bf2710 Compare July 4, 2024 11:32
@nv-hwoo nv-hwoo force-pushed the mwawrzos/openai-vision branch from bb5511d to b82bf40 Compare July 10, 2024 04:56
@nv-hwoo nv-hwoo requested a review from dyastremsky July 10, 2024 05:38
Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work! Did you mean to delete test_end_to_end.py as part of this PR?

@nv-hwoo
Copy link
Contributor

nv-hwoo commented Jul 10, 2024

@dyastremsky yes, the script was originally created by Tim in the beginning of genai-perf when we didn't have CI but we never used it afterwards (it's not even part of our unit test). Since we now have CI in place, I don't think we need this script any more.

@dyastremsky
Copy link
Contributor

@dyastremsky yes, the script was originally created by Tim in the beginning of genai-perf when we didn't have CI but we never used it afterwards (it's not even part of our unit test). Since we now have CI in place, I don't think we need this script any more.

Great job cleaning this up!

@nv-hwoo nv-hwoo merged commit 6259b96 into vision-language Jul 10, 2024
5 checks passed
@nv-hwoo nv-hwoo deleted the mwawrzos/openai-vision branch July 10, 2024 21:30
nv-hwoo added a commit that referenced this pull request Jul 11, 2024
* POC for LLaVA support

* non-streaming request in VLM tests

* image component sent in "image_url" field instead of HTML tag

* generate sample image instead of loading from docs

* add vision to endpoint mapping

* fixes for handling OutputFormat

* refactor - extract image preparation to a separate module

* fixes to the refactor

* replace match-case syntax with if-elseif-else

* Update image payload format and fix tests

* Few clean ups and tickets added for follow up tasks

* Fix and add tests for vision format

* Remove output format from profile data parser

* Revert irrelevant code change

* Revert changes

* Remove unused dependency

* Comment test_extra_inputs

---------

Co-authored-by: Hyunjae Woo <[email protected]>
nv-hwoo added a commit that referenced this pull request Jul 18, 2024
* POC LLaVA VLM support (#720)

* POC for LLaVA support

* non-streaming request in VLM tests

* image component sent in "image_url" field instead of HTML tag

* generate sample image instead of loading from docs

* add vision to endpoint mapping

* fixes for handling OutputFormat

* refactor - extract image preparation to a separate module

* fixes to the refactor

* replace match-case syntax with if-elseif-else

* Update image payload format and fix tests

* Few clean ups and tickets added for follow up tasks

* Fix and add tests for vision format

* Remove output format from profile data parser

* Revert irrelevant code change

* Revert changes

* Remove unused dependency

* Comment test_extra_inputs

---------

Co-authored-by: Hyunjae Woo <[email protected]>

* Support multi-modal input from file for OpenAI Chat Completions (#749)

* add synthetic image generator (#751)

* synthetic image generator

* format randomization

* images should be base64-encoded arbitrarly

* randomized image format

* randomized image shape

* prepare SyntheticImageGenerator to support different image sources

* read from files

* python 3.10 support fixes

* remove unused imports

* skip sampled image sizes with negative values

* formats type fix

* remove unused variable

* synthetic image generator encodes images to base64

* image format not randomized

* sample each dimension independently

Co-authored-by: Hyunjae Woo <[email protected]>

* apply code-review suggestsions

* update class name

* deterministic synthetic image generator

* add typing to SyntheticImageGenerator

* SyntheticImageGenerator doesn't load files

* SyntheticImageGenerator always encodes images to base64

* remove unused imports

* generate gaussian noise instead of blank images

---------

Co-authored-by: Hyunjae Woo <[email protected]>

* Add command line arguments for synthetic image generation (#753)

* Add CLI options for synthetic image generation

* read image format from file when --input-file is used

* move encode_image method to utils

* Lazy import some modules

* Support synthetic image generation in GenAI-Perf (#754)

* support synthetic image generation for VLM model

* add test

* integrate sythetic image generator into LlmInputs

* add source images for synthetic image data

* use abs to get positive int

---------

Co-authored-by: Marek Wawrzos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants