Merge branch 'main' into vllm_0.6.0_integration_test

HuiyingLi · Oct 2, 2024 · 9d9adce · 9d9adce
2 parents cf8c376 + bd618cc
commit 9d9adce
Show file tree

Hide file tree

Showing 3 changed files with 34 additions and 2 deletions.
diff --git a/.github/workflows/cicd-main.yml b/.github/workflows/cicd-main.yml
@@ -5416,7 +5416,7 @@ jobs:
       - if: ${{ always() && steps.pipeline-conclusion.outputs.SUCCESS == 'true' }}
         run: exit 0
 
-      - if: ${{ always() && steps.pipeline-conclusion.outputs.SUCCESS == 'true' && github.event_name == 'pull_request' }}
+      - if: ${{ always() && steps.pipeline-conclusion.outputs.SUCCESS == 'true' && github.event_name == 'pull_request' && env.SLACK_WEBHOOK != '' }}
         uses: peter-evans/create-or-update-comment@v4
         with:
           issue-number: ${{ github.event.number }}

diff --git a/README.md b/README.md
@@ -12,6 +12,12 @@
 ## Latest News
 
 <!-- markdownlint-disable -->
+<details open>
+  <summary><b>NeMo 2.0</b></summary>
+      We've released NeMo 2.0, an update on the NeMo Framework which prioritizes modularity and ease-of-use. Please refer to the <a href=https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html>NeMo Framework User Guide</a> to get started.
+    </details>
+  </details>
+
 <details open>
   <summary><b>Large Language Models and Multimodal Models</b></summary>
       <details>
@@ -134,6 +140,29 @@ pre-trained model checkpoints.
 For technical documentation, please see the [NeMo Framework User
 Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html).
 
+## What's New in NeMo 2.0
+
+NVIDIA NeMo 2.0 introduces several significant improvements over its predecessor, NeMo 1.0, enhancing flexibility, performance, and scalability.
+
+- **Python-Based Configuration** - NeMo 2.0 transitions from YAML files to a Python-based configuration, providing more flexibility and control. This shift makes it easier to extend and customize configurations programmatically.
+
+- **Modular Abstractions** - By adopting PyTorch Lightning’s modular abstractions, NeMo 2.0 simplifies adaptation and experimentation. This modular approach allows developers to more easily modify and experiment with different components of their models.
+
+- **Scalability** - NeMo 2.0 seamlessly scaling large-scale experiments across thousands of GPUs using [NeMo-Run](https://github.com/NVIDIA/NeMo-Run), a powerful tool designed to streamline the configuration, execution, and management of machine learning experiments across computing environments.
+
+Overall, these enhancements make NeMo 2.0 a powerful, scalable, and user-friendly framework for AI model development.
+
+> [!IMPORTANT]  
+> NeMo 2.0 is currently supported by the LLM (large language model) and VLM (vision language model) collections.
+
+### Get Started with NeMo 2.0
+
+- Refer to the [Quickstart](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/quickstart.html) for examples of using NeMo-Run to launch NeMo 2.0 experiments locally and on a slurm cluster.
+- For more information about NeMo 2.0, see the [NeMo Framework User Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/index.html).
+- [NeMo 2.0 Recipes](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/recipes) contains additional examples of launching large-scale runs using NeMo 2.0 and NeMo-Run.
+- For an in-depth exploration of the main features of NeMo 2.0, see the [Feature Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/features/index.html#feature-guide).
+- To transition from NeMo 1.0 to 2.0, see the [Migration Guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemo-2.0/migration/index.html#migration-guide) for step-by-step instructions.
+
 ## LLMs and MMs Training, Alignment, and Customization
 
 All NeMo models are trained with
@@ -177,6 +206,9 @@ production use cases with [NVIDIA Riva](https://developer.nvidia.com/riva).
 
 ## NeMo Framework Launcher
 
+> [!IMPORTANT]  
+> NeMo Framework Launcher is compatible with NeMo version 1.0 only. [NeMo-Run](https://github.com/NVIDIA/NeMo-Run) is recommended for launching experiments using NeMo 2.0.
+
 [NeMo Framework
 Launcher](https://github.com/NVIDIA/NeMo-Megatron-Launcher) is a
 cloud-native tool that streamlines the NeMo Framework experience. It is

diff --git a/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py b/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py
@@ -580,7 +580,7 @@ def setup_mcore_distributed_parallel(self):
                 # using bucket_cap_mb to configure bucket_size here
                 bucket_size=self.cfg.optim.get('ddp_bucket_size', None),
                 average_in_collective=self.cfg.optim.get('average_in_collective', True),
-                overlap_param_gather=self.cfg.optim.get('overlap_param_gather', False),
+                overlap_param_gather=self.cfg.optim.get('overlap_param_sync', False),
                 align_param_gather=self.cfg.optim.get('align_param_gather', False),
                 fp8_param_gather=self.cfg.get('fp8_params', False),
             )