Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add usage instructions for Cosmos TensorRT #11650

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 55 additions & 5 deletions nemo/collections/common/video_tokenizers/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# Cosmos Tokenizer

This directory contains the NeMo implementation of the [NVIDIA Cosmos Tokenizers](https://github.com/NVIDIA/Cosmos-Tokenizer)
This directory contains the NeMo implementation of the [NVIDIA Cosmos Tokenizers](https://github.com/NVIDIA/Cosmos-Tokenizer)
that are hosted on the [Huggingface Hub (HF-Hub)](https://huggingface.co/nvidia/)

## Usage

### Basic usage
The encoder, decoder and autoencoder models can be loaded directly from the HF-Hub using the `from_pretrained` class method
of the `CausalVideoTokenizer` class:

Expand All @@ -14,7 +15,7 @@ from nemo.collections.common.video_tokenizers.cosmos_tokenizer import CausalVide
model = CausalVideoTokenizer.from_pretrained("Cosmos-Tokenizer-DV4x8x8")
```
By default, this will download all three (`{encoder, decoder, autoencoder}.jit`) models from `nvidia/Cosmos-Tokenizer-DV4x8x8`
and will only load the encoder and decoder models.
and will only load the encoder and decoder models.

To encode an input tensor, users can run the following:
```python
Expand All @@ -23,9 +24,58 @@ input_tensor = torch.randn(1, 3, 9, 512, 512).to('cuda').to(torch.bfloat16)
(indices, codes) = model.encode(input_tensor)
```

Please see the official [NVIDIA Cosmos repository](https://github.com/NVIDIA/Cosmos-Tokenizer)
Please see the official [NVIDIA Cosmos repository](https://github.com/NVIDIA/Cosmos-Tokenizer)
for the complete list of supported tokenizers.

### Acceleration with TensorRT
To use these tokenizers with TensorRT and acheive up to 3X speedup during tokenization,
users can define a lightweight wrapper model and then pass this wrapper model to `trt_compile`
```python
import torch
from nemo.collections.common.video_tokenizers.cosmos_tokenizer import CausalVideoTokenizer
from nemo.export.tensorrt_lazy_compiler import trt_compile

class VaeWrapper(torch.nn.Module):
def __init__(self, vae):
super().__init__()
self.vae = vae

def forward(self, input_tensor):
output_tensor = self.vae.autoencode(input_tensor)
return output_tensor

model = CausalVideoTokenizer.from_pretrained(
"Cosmos-Tokenizer-DV4x8x8",
use_pytorch=True,
dtype="float"
)
model_wrapper = VaeWrapper(model)

input_tensor = torch.randn(1, 3, 9, 512, 512).to('cuda').to(torch.float)
opt_shape = min_shape = max_shape = input_tensor.shape

path_to_engine_outputs="./trt_outputs"
trt_compile(
model_wrapper,
path_to_engine_outputs,
args={
"precision": "bf16",
"input_profiles": [
{"input_tensor": [min_shape, opt_shape, max_shape]},
jojennin marked this conversation as resolved.
Show resolved Hide resolved
],
},
)

output = model_wrapper(input_tensor)
```
Note that the `trt_compile` function requires
providing `min_shape`, `opt_shape` and `max_shape`
as arguments (in this example all are set to the input tensor shape for simplicity) which enables inputs with dynamic shapes after compilation.
For more information about TensorRT and dynamic shapes please review the [Torch-Tensorrt documentation](https://pytorch.org/TensorRT/user_guide/dynamic_shapes.html)

The file `cosmos_trt_run.py` provides a stand-alone script to tokenize tensors with a TensorRT-accelerated
Cosmos tokenizer.

# Examples
1. Image generation using [discrete cosmos tokenizer](../../../../nemo/collections/multimodal_autoregressive/data/README.md)
2. Image / Video Megatron Energon WebDataset preparation with [continuous cosmos tokenizer](../../diffusion/data/readme.rst)
1. Multimodal autoregressive model dataset preparation using the [discrete cosmos tokenizer](../../../../nemo/collections/multimodal_autoregressive/data/README.md)
2. Diffusion model dataset preparation using the [continuous cosmos tokenizer](../../diffusion/data/readme.rst)
Loading