Skip to content

Commit

Permalink
Add CI test
Browse files Browse the repository at this point in the history
  • Loading branch information
BoxiangW committed Jan 3, 2025
1 parent bb52f17 commit 648d830
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 3 deletions.
11 changes: 11 additions & 0 deletions .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3702,6 +3702,17 @@ jobs:
TRANSFORMERS_OFFLINE=1 python tests/collections/llm/hf/sft_nemorun.py --model /home/TestData/nlp/hf_gemma/hf_gemma_2b --max-steps 10 --devices 2 --strategy ddp
AFTER_SCRIPT: |
rm -rf nemo_experiments
L2_HF_Transformer_SFT_2gpu_nemorun_fsdp2:
needs: [ cicd-test-container-setup ]
uses: ./.github/workflows/_test_template.yml
if: contains(fromJSON(needs.cicd-test-container-setup.outputs.test_to_run), 'L2_HF_Transformer_SFT_2gpu_nemorun_fsdp2') || needs.cicd-test-container-setup.outputs.all == 'true'
with:
RUNNER: self-hosted-azure
SCRIPT: |
TRANSFORMERS_OFFLINE=1 python tests/collections/llm/hf/sft_nemorun_fsdp2.py --model /home/TestData/nlp/hf_gemma/hf_gemma_2b --max-steps 10 --devices 2
AFTER_SCRIPT: |
rm -rf nemo_experiments
L2_HF_Transformer_SFT:
needs: [ cicd-test-container-setup ]
Expand Down
6 changes: 3 additions & 3 deletions tests/collections/llm/hf/sft_nemorun_fsdp2.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from nemo.collections.llm.gpt.data.hf_dataset import SquadHFDataModule


DATA_PATH = '/home/TestData/lite/hf_cache/squad/'
DATA_PATH = '/lustre/fsw/coreai_dlalgo_llm/boxiangw/squad'


def local_executor_torchrun(nodes: int = 1, devices: int = 2) -> run.LocalExecutor:
Expand All @@ -45,7 +45,7 @@ def local_executor_torchrun(nodes: int = 1, devices: int = 2) -> run.LocalExecut
parser.add_argument('--model', default='meta-llama/Llama-3.2-1B')
parser.add_argument('--devices', default=2)
parser.add_argument('--accelerator', default='gpu', choices=['gpu'])
parser.add_argument('--max-steps', type=int, default=1000)
parser.add_argument('--max-steps', type=int, default=100)
args = parser.parse_args()

recipe = llm.hf_auto_model_for_causal_lm.finetune_recipe(
Expand All @@ -67,7 +67,7 @@ def local_executor_torchrun(nodes: int = 1, devices: int = 2) -> run.LocalExecut
tokenizer=run.Config(AutoTokenizer, pretrained_model_name=args.model),
)

recipe.trainer.strategy = run.Config(nl.FSDP2Strategy, data_parallel_size=1, tensor_parallel_size=2)
recipe.trainer.strategy = run.Config(nl.FSDP2Strategy, data_parallel_size=2, tensor_parallel_size=1)
recipe.trainer.plugins = None
executor = local_executor_torchrun(nodes=recipe.trainer.num_nodes, devices=recipe.trainer.devices)
run.run(recipe, executor=executor)

0 comments on commit 648d830

Please sign in to comment.