Skip to content

Commit

Permalink
Merge branch 'main' into siglip_support
Browse files Browse the repository at this point in the history
Signed-off-by: HuiyingLi <[email protected]>
  • Loading branch information
HuiyingLi committed Jun 4, 2024
2 parents 2233fa6 + 63833cd commit 5aca344
Show file tree
Hide file tree
Showing 94 changed files with 2,297 additions and 3,829 deletions.
52 changes: 7 additions & 45 deletions .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,33 +43,11 @@ jobs:
docker container prune --filter "until=24h" --force
docker image prune -a --filter "until=24h" --force
# checkout-repository:
# runs-on: self-hosted-azure
# container:
# image: nvcr.io/nvidia/pytorch:24.02-py3
# volumes:
# - ${{ github.workspace }}:/workspace
# steps:
# - name: Checkout repository
# uses: actions/checkout@v4
# with:
# path: ${{ github.run_id }}

cicd-test-container-setup:
needs: [cicd-cluster-clean]
runs-on: self-hosted-azure-builder
if: ${{ github.event.label.name == 'Run CICD' }}
# uses: actions/cache@v2
#container:
# image: nvcr.io/nvidia/pytorch:24.02-py3
# options:
# # --user 0:128
# --device=/dev/nvidia0
# --gpus all
# --shm-size=8g
# --env TRANSFORMERS_OFFLINE=0
# --env HYDRA_FULL_ERROR=1
steps:
- name: Checkout repository
uses: actions/checkout@v4
Expand Down Expand Up @@ -114,26 +92,10 @@ jobs:
# These checks are not crucial
exit 0
'
### \'\'
# - name: Build and push to local registry
# uses: docker/build-push-action@v5
# with:
# context: .
# push: true
# tags: nemoci.azurecr.io/name/app:latest

# - name: Inspect
# run: |
# docker buildx imagetools inspect nemoci.azurecr.io/name/app:latest

#- name: Post-workflow execution
# uses: gacts/run-and-post-run@v1
# with:
# post: |
# chmod -R 777 .

L0_Unit_Tests_GPU:
OPTIONAL_L0_Unit_Tests_GPU:
needs: [cicd-test-container-setup]
runs-on: self-hosted-azure
container:
Expand All @@ -152,8 +114,8 @@ jobs:
- name: "L0: Unit Tests GPU"
run: |
NEMO_NUMBA_MINVER=0.53 pytest -m "not pleasefixme" --with_downloads
- uses: "NVIDIA/NeMo/.github/actions/cancel-workflow@main"
if: "failure()"
#- uses: "NVIDIA/NeMo/.github/actions/cancel-workflow@main"
# if: "failure()"


L0_Unit_Tests_CPU:
Expand Down Expand Up @@ -325,7 +287,7 @@ jobs:
# this test is using a 7B model which is too large for GitHub CI
# replace the model in this test with a toy model or move the test
# to the nightly CI
# L2_Community_LLM_Checkpoints_tests_Baichuan2:
# OPTIONAL_L2_Community_LLM_Checkpoints_tests_Baichuan2:
# needs: [cicd-test-container-setup]
# runs-on: self-hosted-azure
# container:
Expand Down Expand Up @@ -6484,12 +6446,12 @@ jobs:

Nemo_CICD_Test:
needs:
- L0_Unit_Tests_GPU
#- OPTIONAL_L0_Unit_Tests_GPU
- L0_Unit_Tests_CPU
- L2_Community_LLM_Checkpoints_tests_Llama
- L2_Community_LLM_Checkpoints_tests_StarCoder
- L2_Community_LLM_Checkpoints_tests_Falcon
#- L2_Community_LLM_Checkpoints_tests_Baichuan2
#- OPTIONAL_L2_Community_LLM_Checkpoints_tests_Baichuan2
- ASR_dev_run_Speech_to_Text
- ASR_dev_run_Speech_to_Text_WPE_-_CitriNet
- ASR_dev_run_Speech_Pre-training_-_CitriNet
Expand Down
297 changes: 149 additions & 148 deletions README.rst

Large diffs are not rendered by default.

42 changes: 42 additions & 0 deletions docs/source/features/mixed_precision.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,45 @@ Mixed Precision Training
------------------------

Mixed precision training significantly enhances computational efficiency by conducting operations in half-precision and fp8 formats, while selectively maintaining minimal data in single-precision to preserve critical information throughout key areas of the network. NeMo now supports FP16, BF16, and FP8 (via Transformer Engine) across most models. Further details will be provided shortly.


FP8 usage
=========

Overview
^^^^^^^^

NVIDIA H100 GPU introduced support for a new datatype, FP8 (8-bit floating point), enabling higher throughput of matrix multiplies and convolutions. NeMo uses the NVIDIA `TransformerEngine <https://github.com/NVIDIA/TransformerEngine>`_ (TE) in order to leverage speedups from FP8. The following table summarizes the FP8 related arguments that can be configured in NeMo (`example config setting <https://github.com/NVIDIA/NeMo/blob/2e1814c9f031ad2aeeebad44597365e97253d2c4/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml/#L192-L200>`_). For a more detailed overview, refer to the TE `documentation <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html>`_, specifically the FP8 `format <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/common.html#transformer_engine.common.recipe.Format>`_ and `recipe <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/common.html#transformer_engine.common.recipe.DelayedScaling>`_.

.. list-table:: FP8 arguments
:widths: 25 25 50
:header-rows: 1

* - Argument
- Description
* - transformer_engine
- TE and related functionality can be enabled by setting this boolean argument to True. If this argument is not set to True, all subsequent arguments will be ignored.
* - fp8
- Enables FP8 training. For transformer networks, the QKV, projection, FC1, and FC2 matrix multiplications are executed using the 4th generation H100 tensor cores with FP8 support.
* - fp8_e4m3
- Training recipe format for FP8. Activations, weights, and gradient tensors use the E4M3 format.
* - fp8_hybrid
- Training recipe format for FP8. Activations and weight tensors use the E4M3 format, whereas gradient use the E5M2 format to satisfy the additional dynamic range requirement for backward tensors. This is the default setting.
* - fp8_margin
- The scaling factor for FP8 tensors can be shifted by a factor of $2 ^ {margin}$ using this argument.
* - fp8_amax_history_len
- Window size for amax history. The window size determines how many instances of the most recent absolute max values (amaxes) are stored per tensor.
* - fp8_amax_compute_algo
- The choice between “max” and “most_recent” specifies how to select an amax value from the given history.
* - reduce_amax
- Indicates whether or not to perform an allreduce on the amax (absolute max) values for the FP8 tensors. Since the amax is directly used to compute the scaling factor for FP8 tensors, setting this argument ensures that the scaling factors for a tensor remain synchronized across devices in multi-GPU training configurations.
* - fp8_params
- Indicates whether or not to store module level parameters in FP8. Enabling this option can lead to reduced memory consumption. It eliminates the need to store a copy of weights in higher precision (> half) for cases where these weights are externally maintained, such as master parameters in the optimizer. For more information, refer to the `fp8_model_init <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/pytorch.html#transformer_engine.pytorch.fp8_model_init>`_ API in TE.

Resources
^^^^^^^^^

- `TE documentation <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html>`_
- `Intro to FP8, floating point formats, and mixed precision training <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/fp8_primer.html#Introduction-to-FP8>`_
- `Performance optimizations <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/examples/advanced_optimizations.html>`_ that are natively supported in NeMo by enabling FP8 training with TE
- `TE installation <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html>`_
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ model:
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 17
d_model: 512
use_bias: True # whether to apply bias in the feedforward, MHA and convolution modules

# Sub-sampling parameters
subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ model:
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 17
d_model: 512
use_bias: True # whether to apply bias in the feedforward, MHA and convolution modules

# Sub-sampling params
subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ model:
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 17
d_model: 512
use_bias: True # whether to apply bias in the feedforward, MHA and convolution modules

# Sub-sampling parameters
subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ model:
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 17
d_model: 512
use_bias: True # whether to apply bias in the feedforward, MHA and convolution modules

# Sub-sampling params
subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ model:
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 17
d_model: 512
use_bias: True # whether to apply bias in the feedforward, MHA and convolution modules

# Sub-sampling parameters
subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ model:
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 17
d_model: 512
use_bias: True # whether to apply bias in the feedforward, MHA and convolution modules

# Sub-sampling params
subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ model:
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 17
d_model: 512
use_bias: True # whether to apply bias in the feedforward, MHA and convolution modules

# Sub-sampling parameters
subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ model:
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 17
d_model: 512
use_bias: True # whether to apply bias in the feedforward, MHA and convolution modules

# Sub-sampling params
subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ model:
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 18
d_model: 512
use_bias: True # whether to apply bias in the feedforward, MHA and convolution modules

# Sub-sampling params
subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ model:
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 17
d_model: 512
use_bias: True # whether to apply bias in the feedforward, MHA and convolution modules

# Sub-sampling parameters
subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
Expand Down
1 change: 1 addition & 0 deletions examples/asr/conf/ssl/fastconformer/fast-conformer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ model:
feat_out: -1 # you may set it if you need different output size other than the default d_model
n_layers: 17
d_model: 512
use_bias: True # whether to apply bias in the feedforward, MHA and convolution modules

# Sub-sampling params
subsampling: dw_striding # vggnet, striding, stacking or stacking_norm, dw_striding
Expand Down
13 changes: 12 additions & 1 deletion examples/asr/transcribe_speech.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
import glob
import json
import os
from dataclasses import dataclass, is_dataclass
from dataclasses import dataclass, field, is_dataclass
from tempfile import NamedTemporaryFile
from typing import List, Optional, Union

Expand All @@ -25,6 +25,7 @@
from omegaconf import OmegaConf, open_dict

from nemo.collections.asr.models import EncDecCTCModel, EncDecHybridRNNTCTCModel, EncDecMultiTaskModel
from nemo.collections.asr.models.aed_multitask_models import parse_multitask_prompt
from nemo.collections.asr.modules.conformer_encoder import ConformerChangeConfig
from nemo.collections.asr.parts.submodules.ctc_decoding import CTCDecodingConfig
from nemo.collections.asr.parts.submodules.multitask_decoding import MultiTaskDecoding, MultiTaskDecodingConfig
Expand Down Expand Up @@ -169,6 +170,14 @@ class TranscriptionConfig:

# Decoding strategy for AED models
multitask_decoding: MultiTaskDecodingConfig = MultiTaskDecodingConfig()
# Prompt slots for prompted models, e.g. Canary-1B. Examples of acceptable prompt inputs:
# Implicit single-turn assuming default role='user' (works with Canary-1B)
# +prompt.source_lang=en +prompt.target_lang=es +prompt.task=asr +prompt.pnc=yes
# Explicit single-turn prompt:
# +prompt.role=user +prompt.slots.source_lang=en +prompt.slots.target_lang=es +prompt.slots.task=s2t_translation +prompt.slots.pnc=yes
# Explicit multi-turn prompt:
# +prompt.turns='[{role:user,slots:{source_lang:en,target_lang:es,task:asr,pnc:yes}}]'
prompt: dict = field(default_factory=dict)

# decoder type: ctc or rnnt, can be used to switch between CTC and RNNT decoder for Hybrid RNNT/CTC models
decoder_type: Optional[str] = None
Expand Down Expand Up @@ -411,6 +420,8 @@ def autocast(dtype=None):
override_cfg.augmentor = augmentor
override_cfg.text_field = cfg.gt_text_attr_name
override_cfg.lang_field = cfg.gt_lang_attr_name
if hasattr(override_cfg, "prompt"):
override_cfg.prompt = parse_multitask_prompt(OmegaConf.to_container(cfg.prompt))
transcriptions = asr_model.transcribe(
audio=filepaths,
override_config=override_cfg,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ model:
from_pretrained: "openai/clip-vit-large-patch14" # path or name
from_hf: True
patch_dim: 14
crop_size: [224, 224]
hidden_size: 1024 # could be found from model but tricky in code
vision_select_layer: -2 # default to the last layer
class_token_length: 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ model:
from_pretrained: "" # path or name
from_hf: True
patch_dim: 14
crop_size: [224, 224]
hidden_size: 1024 # could be found from model but tricky in code
vision_select_layer: -2 # default to the last layer
class_token_length: 1
Expand Down Expand Up @@ -189,7 +190,6 @@ model:
is_multimodal: True
media_type: image # currently supported: image
sep_image_conv_front: False
image_token_len: 256
conv_template: ${model.mm_cfg.llm.model_type} # check `nemo/collections/multimodal/data/neva/conversation.py`
image_folder: null
image_aspect_ratio: 'square'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ model:
from_pretrained: "" # path or name
from_hf: True
patch_dim: 14
crop_size: [224, 224]
hidden_size: 1024 # could be found from model but tricky in code
vision_select_layer: -2 # default to the last layer
class_token_length: 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ model:
from_pretrained: "" # path or name
from_hf: True
patch_dim: 14
crop_size: [336, 336]
hidden_size: 1024 # could be found from model but tricky in code
vision_select_layer: -2 # default to the last layer
class_token_length: 1
Expand Down Expand Up @@ -194,7 +195,6 @@ model:
num_frames: 8 # selects the number of frames to use from the video
sep_token_between_frames: False # TODO: allow usage of separator tokens between frames
sep_image_conv_front: False
image_token_len: 256
conv_template: ${model.mm_cfg.llm.model_type} # check `nemo/collections/multimodal/data/neva/conversation.py`
image_folder: null
video_folder: null
Expand Down
Loading

0 comments on commit 5aca344

Please sign in to comment.