Add command line arguments for synthetic image generation #753

nv-hwoo · 2024-07-12T23:17:01Z

Added CLI options for tweaking image resolution (width and height)
Added CLI option for image format
Added unit tests for the new options
Passed arguments to LlmInputs for usage

A follow-up PR will try to bridge the works done in this PR and #751 within LlmInputs class.

dyastremsky

Fantastic work here, Hyunjae!

dyastremsky · 2024-07-15T16:46:32Z

General team comments outside the scope of this PR:

We're adding a lot of args to LLM Inputs to support each endpoint, so we'll eventually need to find a way to move away from that practice. You skipped the check for non-VLM images having image args, which is probably a better practice... we might do too much checking today. It would be good to find a way to minimize imports used for only one endpoint type, e.g. guard the base64 and PIL import in utils.py. Those are larger code comments

CC: @nv-hwoo @debermudez

nv-hwoo · 2024-07-15T18:13:53Z

Thanks for the great insights @dyastremsky. I agree with your concerns. If you would like, I can also lazy import the base64 and PIL imports so that these modules won't be loaded for other endpoints.

dyastremsky · 2024-07-15T19:38:15Z

Thanks for the great insights @dyastremsky. I agree with your concerns. If you would like, I can also lazy import the base64 and PIL imports so that these modules won't be loaded for other endpoints.

Great solution. That could be helpful, if it's not too much effort.

* POC LLaVA VLM support (#720) * POC for LLaVA support * non-streaming request in VLM tests * image component sent in "image_url" field instead of HTML tag * generate sample image instead of loading from docs * add vision to endpoint mapping * fixes for handling OutputFormat * refactor - extract image preparation to a separate module * fixes to the refactor * replace match-case syntax with if-elseif-else * Update image payload format and fix tests * Few clean ups and tickets added for follow up tasks * Fix and add tests for vision format * Remove output format from profile data parser * Revert irrelevant code change * Revert changes * Remove unused dependency * Comment test_extra_inputs --------- Co-authored-by: Hyunjae Woo <[email protected]> * Support multi-modal input from file for OpenAI Chat Completions (#749) * add synthetic image generator (#751) * synthetic image generator * format randomization * images should be base64-encoded arbitrarly * randomized image format * randomized image shape * prepare SyntheticImageGenerator to support different image sources * read from files * python 3.10 support fixes * remove unused imports * skip sampled image sizes with negative values * formats type fix * remove unused variable * synthetic image generator encodes images to base64 * image format not randomized * sample each dimension independently Co-authored-by: Hyunjae Woo <[email protected]> * apply code-review suggestsions * update class name * deterministic synthetic image generator * add typing to SyntheticImageGenerator * SyntheticImageGenerator doesn't load files * SyntheticImageGenerator always encodes images to base64 * remove unused imports * generate gaussian noise instead of blank images --------- Co-authored-by: Hyunjae Woo <[email protected]> * Add command line arguments for synthetic image generation (#753) * Add CLI options for synthetic image generation * read image format from file when --input-file is used * move encode_image method to utils * Lazy import some modules * Support synthetic image generation in GenAI-Perf (#754) * support synthetic image generation for VLM model * add test * integrate sythetic image generator into LlmInputs * add source images for synthetic image data * use abs to get positive int --------- Co-authored-by: Marek Wawrzos <[email protected]>

Add CLI options for synthetic image generation

0127e9b

nv-hwoo requested review from mwawrzos and dyastremsky July 12, 2024 23:18

nv-hwoo added 2 commits July 13, 2024 22:44

read image format from file when --input-file is used

b1ae2b3

move encode_image method to utils

1ecca60

dyastremsky approved these changes Jul 15, 2024

View reviewed changes

nv-hwoo added 2 commits July 15, 2024 14:27

Lazy import some modules

68bacca

Merge branch 'vision-language' into hwoo-image-cli

a06edf9

dyastremsky approved these changes Jul 15, 2024

View reviewed changes

nv-hwoo merged commit 8e5570e into vision-language Jul 15, 2024
5 checks passed

nv-hwoo deleted the hwoo-image-cli branch July 15, 2024 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add command line arguments for synthetic image generation #753

Add command line arguments for synthetic image generation #753

nv-hwoo commented Jul 12, 2024

dyastremsky left a comment

dyastremsky commented Jul 15, 2024

nv-hwoo commented Jul 15, 2024

dyastremsky commented Jul 15, 2024

Add command line arguments for synthetic image generation #753

Add command line arguments for synthetic image generation #753

Conversation

nv-hwoo commented Jul 12, 2024

dyastremsky left a comment

Choose a reason for hiding this comment

dyastremsky commented Jul 15, 2024

nv-hwoo commented Jul 15, 2024

dyastremsky commented Jul 15, 2024