Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Fixed small bug with NoisePerturbationWithNormalization (NVIDIA#7118) * Fix import guard checks (NVIDIA#7124) * Revert "Fix import guard checks (NVIDIA#7124)" (NVIDIA#7125) This reverts commit a46e325. * Fix import guard checks (NVIDIA#7126) * Fix import guard checks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- * Add updated fc ctc and rnnt xxl models (NVIDIA#7128) (NVIDIA#7130) * [TTS] Create EnCodec training recipe (NVIDIA#6852) * [TTS] Create EnCodec training recipe * [TTS] Update encodec recipe * [TTS] Rename EnCodec to AudioCodec * [TTS] Add EnCodec unit tests * [TTS] Add copyright header to distributed.py --------- * Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (NVIDIA#7061) * fix default attention size (NVIDIA#7141) (NVIDIA#7143) * fix evaluator.py for various exceptions by ast (NVIDIA#7150) * [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (NVIDIA#6893) * [TTS] add Chinese TTS recipe based on IPA. * add new pinyin and ipa dictionaries with 36 finals. * add yaml configs for 24-final pinyin and ipa. * add copyright header * add a directory level 24finals to discriminate from 36 finals. * unify configs into a single one and add detailed comments providing supported candidates. * choose 36-final IPA as default phoneme dict --------- * [TTS] Add output audio format to preprocessing (NVIDIA#6889) * [TTS] Add output audio format to preprocessing * [TTS] Add format validation * [TTS] Fix data tutorial --------- * freeze (NVIDIA#7152) * make sure any empty segments are removed (NVIDIA#7155) * Update RIR generation scripts (NVIDIA#6547) - fix: reduce room size if evaluation of params fails - added randomized mic placement - added diffuse noise generation - added an option to specify the format and subtype for saved audio * A quickstart speech enhancement tutorial (NVIDIA#6492) A simple example of training a model for speech enhancement task * NFA subtitle file config - specify colors and vertical alignment (NVIDIA#7160) * allow specifying colors of text in ASS subtitle file * specify vertical_alignment instead of marginv in ass_file_config * add documentation of CTMFileConfig and ASSFileConfig to NFA README --------- * Eagerly accumulate embedding grads into fp32 buffer (NVIDIA#6958) (NVIDIA#7153) * TE bug fix (NVIDIA#7027) (NVIDIA#7036) * [TTS] Remove nested TTS configs (NVIDIA#7154) * [TTS] Remove nested TTS configs * [TTS] Modify tutorial to support multiple sampling rates * [TTS] Clarify min_duration unit * [TTS] Default 22.05kHz highfreq to null --------- * Merge release r1.20.0 to main (NVIDIA#7167) * update package info * Add ASR with TTS Tutorial. Fix enhancer usage. (NVIDIA#6955) * Add ASR with TTS Tutorial * Fix enhancer usage * install_bs (NVIDIA#7019) * Fix typo and branch in tutorial (NVIDIA#7048) * fix syntax error introduced in PR-7079 (NVIDIA#7102) * fix syntax error introduced in PR-7079 * fixes for pr review --------- * fix links for TN (NVIDIA#7117) * update branch (NVIDIA#7135) * Fixed main and merging this to r1.20 (NVIDIA#7127) * Fixed main and merging this to r1.20 * Update vad_utils.py --------- * update branch * fix version * resolve conflict the other way * keep both * revert keep both --------- * Upgrade to pytorch lightning 2.0 (NVIDIA#6433) * Upgrade pytorch lightning version in requirements * Initial fixes for PTL2.0 * Add further fixes to support lightning 2.0 * Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end * Replace all occurances of validation_epoch_end to on_validation_epoch_end * Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively * Change logger=None to logger=False in Trainer object * Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass * Modify trainer.precision check and other small edits * Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer * Add default values for args to fix Attribute Error * Add the following modifications 1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class 2) Replace resume_from_checkpoint with ckpt_path as needed 3) Explicitly add accelerator as 'CPU' in UTs being run on CPU * Remove outputs arg from on_validation_epoch_end, on_test_epoch_end * Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings * Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel * Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py * Revert an extra space that was mistakenly added * Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity * Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity * Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing * Remove outputs arg from on_train_epoch_end * Remove outputs from on_validation_epoch_end in multi_binary_acc.py * Remove output args from on_validation_epoch_end in the docstrings of some ASR files * Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs * Add on_validation_epoch_end and remove outputs args for nlp models * Append output of validation_step to validation_step_outputs in EncDecClassificationModel * Add the following changes 1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed 2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist 3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0 * Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py * TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError * Add if condition check for multiple dataloaders when appending to validation outputs * Separate validation pass to be used with both validation_step and test_step * Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py * Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len * Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0 * Modify precision checks to account for 16-mixed and bf16-mixed * Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel * Modify find_unused_parameters=True in g2p_heteronym model 1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py 2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py * Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel * Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml * Add split arg self.test_step_outputs to TextClassificationModel * Add test_step_outputs to dialogue and text classification models * Change condition check for multiple dataloaders: 1) Replace ds_item as list in dialogue_config.yaml 2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step 3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py * Add additional condition for multi dataloaders Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step * Add val step outputs and default val for dataloader_idx 1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode 2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback 3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg * Add val/test_step_outputs to S2SQAModel and GPTQAModel * Edit JenkinsFile for bert_pretrainig.py Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error * Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py * Add ddp_find_unused_parameters_true and remove output args 1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters 2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py 3) Comment tests in JenkinsFile that need to be fixed * Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed * Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py * Precision fix and validation/test_step_outputs 1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py 2) Reset ckpt_path for test in enc_dec_nmt.py 3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py 4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN * Precision fix and skip few failing tests * Add missing comment lines in JenkinsFile * Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py * Minor edit JenkinsFile * Minor edit in jenkins file * Edit in Jenkins file * Comment missed lines in Jenkins file * Fix precision and validation/test outputs 1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py 2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py 3) Add back resume_from_checkpoint in the megatron_t5_config.yaml 4) Comment out certain tests in Jenkins file * Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py * Precision fix and edit precision typo in all files 1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py 2) Fix precision typo in all files * Fix all CI TTS tests and comment few Jenkins tests * Combine xx_epoch_end and on_xx_epoch_end Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py * Add a missing comment in JenkinsFile * Add try except StopIteration in validation_step for models with dataloader_iter * Remove pyyaml from requirements * Add try except for inference_step in megatron_finetune_model.py * Remove limit_val_batches for mockGPTDataset test * Add new self.validation_step_outputs for MegatronGPTSFTModel * Minor edit Jenkinsfile * Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model. * Remove resume_from_checkpoint if trainer arg in conf yaml files * Remove resume_from_checkpoint as trainer arg in GPT, T5 configs * Remove resume_from_checkpoint in duplex_tn_config.yaml * Fix typos, unused imports and refactor code to remove redundant funcs * Remove commented code in megatron_nmt_model.py * Fix overriden functions to match parent class functions * Prefetch dataloader_iter to prevent hang for PP>1 * Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1 * Uncomment tests in JenkinsFile * Add '16' to precision checks and other minor fixes * Clear validation/test_step_outputs with dataloader_idx for multi dataloaders * Minor edits * Modify precision checks to avoid indexing * Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs * Reference checkpoint with trainer.ckpt_path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add _prefetch to NLPModel and minor fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add limit_val_batches in JenkinsFile for NMT 1) Add trainer.limit_val_batches in Megatron NMT Training TP=2 2) Remove unused import in ModelPT --------- * Include the scripts for preprocessing OAST and unit tests for chat sft datasets (NVIDIA#7112) * scripts for sft * fix style * adde special token only for huggingface model * change default name * print out error datapoint content * show error id * annotation script working * try to be compatible with huggingface tokenizer * added examples * added lang * added lang * text to value special case * configure the slider * annoatation handles lang * added the unit test for chat sft dataset * used the file in the test dir * fix json error * load local tokenizer * remove mask count check * added HF dataset backend * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- * add paths to labeler. (NVIDIA#7087) * T5 metrics fix (NVIDIA#7037) * Fix race condition when executing with multi-node where some ranks does not wait for setup (NVIDIA#7016) * Added bool types to neural_types export (NVIDIA#7032) * rnnt and char utils (NVIDIA#6971) * rnnt_ngram_merge * char level bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- * fix tab text gen (NVIDIA#7022) (NVIDIA#7031) * Fixed kwargs for metric instance init * Fixed kwargs for metric instance init * removed kwagrs * Updated config desc * ASR Confidence update and tutorial (NVIDIA#6810) * small fixes and tests * various fixes for the tutorial * tutorial added * for for a little oops after rebasement * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests * unused import removed * fix review comments * deprecated parameters for greedy configs * move re-assigning to configs * fix comments 2 * fix config tests * fix ece test (my env was bugged apparently) * renamings for confidence ensemble * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fox comments 3 * return dropped tutorial * CI flips back and forth, increasing tolerance --------- * install_bs (NVIDIA#7019) (NVIDIA#7028) * fixes for spellmapper (NVIDIA#6994) (NVIDIA#7000) * added back the retro documents (NVIDIA#7033) * Remove pyyaml (NVIDIA#7052) (NVIDIA#7054) * st standalone model (NVIDIA#6969) * st standalone model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix * sacrebleu import fix, unused imports removed * import guard for nlp inside asr transformer bpe model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comments answered * import ordering fix * yttm for asr removed * logging added * added inference and translate method * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- * remove pos emb from state dict for old models (NVIDIA#7068) * remove pos emb from state dict * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to nlp_model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update comment * fix nmt test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix nmt test --------- * Fix typo in ASR-TTS tutorial (NVIDIA#7049) * Fixed tutorial's name (NVIDIA#7047) * Fix documentation for Numba (NVIDIA#7065) (NVIDIA#7077) * Fix documentation for Numba * Update force float32 flag dynamically * Update force float32 flag dynamically * Fix nemo version --------- * Update Frame-VAD doc and fix onnx export (NVIDIA#7076) * update fvad doc * fix typo * update fvad example * update * fix onnx export * update test * refactor * update doc * update --------- * memmap worker arg (NVIDIA#7062) * memmap worker arg * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * update --------- * Fix caching bug in causal convolutions for cache-aware ASR models (NVIDIA#7034) (NVIDIA#7082) * Fast Conformer global token fix (NVIDIA#7085) * old way * fix * fix * fix * remove extra * clean * clean * clean * fix * fix * fix * fix * fix * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- * Refined export_config (NVIDIA#7053) (NVIDIA#7066) * Refined export_config * Rolling back hierarchy change --------- * small Bugfix (NVIDIA#7081) * small Bugfix (NVIDIA#7079) * fix branch * fix typo * fix link --------- * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb --------- * Added script to extract ASR CTC and RNNT models from ASR hybrid models (NVIDIA#7092) * Added script to extract ctc and rnnt models from hybrid models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid extraction script for review request 1 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid convert script to remove --cuda flag --------- * Adding docs and models for multiple lookahead cache-aware ASR (NVIDIA#7067) (NVIDIA#7094) * update TTS readme (NVIDIA#7088) * update TTS readme --------- * Fix absolute path in path join call (NVIDIA#7099) * Disable distopt contiguous param buffer by default (NVIDIA#7095) * microphone demo (NVIDIA#7110) * [Fix] load_state_dict in nlp_model.py (NVIDIA#7086) * Fix load_state_dict in nlp_model.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- * Fix plot function in vad_utils.py (NVIDIA#7113) Fix plot function in vad_utils.py * Fixed small bug with NoisePerturbationWithNormalization (NVIDIA#7118) * Fix import guard checks (NVIDIA#7124) * Revert "Fix import guard checks (NVIDIA#7124)" (NVIDIA#7125) This reverts commit a46e325. * Fix import guard checks (NVIDIA#7126) * Fix import guard checks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- * Add updated fc ctc and rnnt xxl models (NVIDIA#7128) (NVIDIA#7130) * [TTS] Create EnCodec training recipe (NVIDIA#6852) * [TTS] Create EnCodec training recipe * [TTS] Update encodec recipe * [TTS] Rename EnCodec to AudioCodec * [TTS] Add EnCodec unit tests * [TTS] Add copyright header to distributed.py --------- * Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (NVIDIA#7061) * fix default attention size (NVIDIA#7141) (NVIDIA#7143) * fix evaluator.py for various exceptions by ast (NVIDIA#7150) * [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (NVIDIA#6893) * [TTS] add Chinese TTS recipe based on IPA. * add new pinyin and ipa dictionaries with 36 finals. * add yaml configs for 24-final pinyin and ipa. * add copyright header * add a directory level 24finals to discriminate from 36 finals. * unify configs into a single one and add detailed comments providing supported candidates. * choose 36-final IPA as default phoneme dict --------- * [TTS] Add output audio format to preprocessing (NVIDIA#6889) * [TTS] Add output audio format to preprocessing * [TTS] Add format validation * [TTS] Fix data tutorial --------- * freeze (NVIDIA#7152) * make sure any empty segments are removed (NVIDIA#7155) * Update RIR generation scripts (NVIDIA#6547) - fix: reduce room size if evaluation of params fails - added randomized mic placement - added diffuse noise generation - added an option to specify the format and subtype for saved audio * A quickstart speech enhancement tutorial (NVIDIA#6492) A simple example of training a model for speech enhancement task * NFA subtitle file config - specify colors and vertical alignment (NVIDIA#7160) * allow specifying colors of text in ASS subtitle file * specify vertical_alignment instead of marginv in ass_file_config * add documentation of CTMFileConfig and ASSFileConfig to NFA README --------- * Eagerly accumulate embedding grads into fp32 buffer (NVIDIA#6958) (NVIDIA#7153) * TE bug fix (NVIDIA#7027) (NVIDIA#7036) * [TTS] Remove nested TTS configs (NVIDIA#7154) * [TTS] Remove nested TTS configs * [TTS] Modify tutorial to support multiple sampling rates * [TTS] Clarify min_duration unit * [TTS] Default 22.05kHz highfreq to null --------- * Merge release r1.20.0 to main (NVIDIA#7167) * update package info * Add ASR with TTS Tutorial. Fix enhancer usage. (NVIDIA#6955) * Add ASR with TTS Tutorial * Fix enhancer usage * install_bs (NVIDIA#7019) * Fix typo and branch in tutorial (NVIDIA#7048) * fix syntax error introduced in PR-7079 (NVIDIA#7102) * fix syntax error introduced in PR-7079 * fixes for pr review --------- * fix links for TN (NVIDIA#7117) * update branch (NVIDIA#7135) * Fixed main and merging this to r1.20 (NVIDIA#7127) * Fixed main and merging this to r1.20 * Update vad_utils.py --------- * update branch * fix version * resolve conflict the other way * keep both * revert keep both --------- * Upgrade to pytorch lightning 2.0 (NVIDIA#6433) * Upgrade pytorch lightning version in requirements * Initial fixes for PTL2.0 * Add further fixes to support lightning 2.0 * Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end * Replace all occurances of validation_epoch_end to on_validation_epoch_end * Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively * Change logger=None to logger=False in Trainer object * Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass * Modify trainer.precision check and other small edits * Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer * Add default values for args to fix Attribute Error * Add the following modifications 1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class 2) Replace resume_from_checkpoint with ckpt_path as needed 3) Explicitly add accelerator as 'CPU' in UTs being run on CPU * Remove outputs arg from on_validation_epoch_end, on_test_epoch_end * Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings * Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel * Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py * Revert an extra space that was mistakenly added * Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity * Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity * Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing * Remove outputs arg from on_train_epoch_end * Remove outputs from on_validation_epoch_end in multi_binary_acc.py * Remove output args from on_validation_epoch_end in the docstrings of some ASR files * Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs * Add on_validation_epoch_end and remove outputs args for nlp models * Append output of validation_step to validation_step_outputs in EncDecClassificationModel * Add the following changes 1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed 2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist 3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0 * Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py * TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError * Add if condition check for multiple dataloaders when appending to validation outputs * Separate validation pass to be used with both validation_step and test_step * Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py * Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len * Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0 * Modify precision checks to account for 16-mixed and bf16-mixed * Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel * Modify find_unused_parameters=True in g2p_heteronym model 1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py 2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py * Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel * Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml * Add split arg self.test_step_outputs to TextClassificationModel * Add test_step_outputs to dialogue and text classification models * Change condition check for multiple dataloaders: 1) Replace ds_item as list in dialogue_config.yaml 2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step 3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py * Add additional condition for multi dataloaders Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step * Add val step outputs and default val for dataloader_idx 1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode 2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback 3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg * Add val/test_step_outputs to S2SQAModel and GPTQAModel * Edit JenkinsFile for bert_pretrainig.py Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error * Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py * Add ddp_find_unused_parameters_true and remove output args 1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters 2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py 3) Comment tests in JenkinsFile that need to be fixed * Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed * Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py * Precision fix and validation/test_step_outputs 1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py 2) Reset ckpt_path for test in enc_dec_nmt.py 3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py 4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN * Precision fix and skip few failing tests * Add missing comment lines in JenkinsFile * Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py * Minor edit JenkinsFile * Minor edit in jenkins file * Edit in Jenkins file * Comment missed lines in Jenkins file * Fix precision and validation/test outputs 1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py 2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py 3) Add back resume_from_checkpoint in the megatron_t5_config.yaml 4) Comment out certain tests in Jenkins file * Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py * Precision fix and edit precision typo in all files 1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py 2) Fix precision typo in all files * Fix all CI TTS tests and comment few Jenkins tests * Combine xx_epoch_end and on_xx_epoch_end Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py * Add a missing comment in JenkinsFile * Add try except StopIteration in validation_step for models with dataloader_iter * Remove pyyaml from requirements * Add try except for inference_step in megatron_finetune_model.py * Remove limit_val_batches for mockGPTDataset test * Add new self.validation_step_outputs for MegatronGPTSFTModel * Minor edit Jenkinsfile * Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model. * Remove resume_from_checkpoint if trainer arg in conf yaml files * Remove resume_from_checkpoint as trainer arg in GPT, T5 configs * Remove resume_from_checkpoint in duplex_tn_config.yaml * Fix typos, unused imports and refactor code to remove redundant funcs * Remove commented code in megatron_nmt_model.py * Fix overriden functions to match parent class functions * Prefetch dataloader_iter to prevent hang for PP>1 * Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1 * Uncomment tests in JenkinsFile * Add '16' to precision checks and other minor fixes * Clear validation/test_step_outputs with dataloader_idx for multi dataloaders * Minor edits * Modify precision checks to avoid indexing * Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs * Reference checkpoint with trainer.ckpt_path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add _prefetch to NLPModel and minor fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add limit_val_batches in JenkinsFile for NMT 1) Add trainer.limit_val_batches in Megatron NMT Training TP=2 2) Remove unused import in ModelPT --------- * Include the scripts for preprocessing OAST and unit tests for chat sft datasets (NVIDIA#7112) * scripts for sft * fix style * adde special token only for huggingface model * change default name * print out error datapoint content * show error id * annotation script working * try to be compatible with huggingface tokenizer * added examples * added lang * added lang * text to value special case * configure the slider * annoatation handles lang * added the unit test for chat sft dataset * used the file in the test dir * fix json error * load local tokenizer * remove mask count check * added HF dataset backend * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- * add paths to labeler. (NVIDIA#7087) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Adi Renduchintala <adithyar… Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Ryan <[email protected]> Signed-off-by: Kim Ngo <[email protected]> Signed-off-by: He Huang (Steve) <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Alexandra Antonova <[email protected]> Signed-off-by: Evelina <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: Yi Dong <[email protected]> Signed-off-by: jubick1337 <[email protected]> Signed-off-by: tbartley94 <[email protected]> Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: AlexGrinch <[email protected]> Signed-off-by: Vitaly Lavrukhin <[email protected]> Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: sam1373 <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Jan Beckmann <[email protected]> Signed-off-by: Linnea Pari Leaver <[email protected]> Signed-off-by: Xin Yao <[email protected]> Signed-off-by: fayejf <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: hsiehjackson <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Alexandra Antonova <[email protected]> Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Vahid <[email protected]> Signed-off-by: David Mosallanezhad <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: ekmb <[email protected]> Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> Signed-off-by: Dima Rekesh <[email protected]> Signed-off-by: Jim O’Regan <[email protected]> Signed-off-by: Mostafa Ghorbandoost <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Kunal Dhawan <[email protected]> Signed-off-by: andrusenkoau <[email protected]> Signed-off-by: Andrei Andrusenko <[email protected]> Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: Greg Clark <[email protected]> Signed-off-by: Eric Harper <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Guyue Huang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: Siddharth Tyagi <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Jason Wang <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Alireza Morsali <[email protected]> Signed-off-by: Siddharth Tyagi <[email protected]> Signed-off-by: dorotat <[email protected]> Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Xin Yao <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Signed-off-by: Alexander Jipa <[email protected]> Signed-off-by: omahs <[email protected]> Signed-off-by: lhb8125 <[email protected]> Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Jason <[email protected]> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Tamerlan Tabolov <[email protected]> Signed-off-by: zhehuaichen <[email protected]> Co-authored-by: trias702 <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Ryan Langman <[email protected]> Co-authored-by: Kim Ngo <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: anteju <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Nikolay Karpov <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Matvei Novikov <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Aleksandr Laptev <[email protected]> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]> Co-authored-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Vahid Noroozi <[email protected]> Co-authored-by: Samuel Kriman <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Jan Beckmann <[email protected]> Co-authored-by: lleaver <[email protected]> Co-authored-by: Linnea Pari Leaver <[email protected]> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: anmolgupt <[email protected]> Co-authored-by: ANMOL GUPTA <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: Virginia Adams <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Ante Jukić <[email protected]> Co-authored-by: David Mosallanezhad <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Neha Tadimeti <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]> Co-authored-by: Dima Rekesh <[email protected]> Co-authored-by: Jim O’Regan <[email protected]> Co-authored-by: Mostafa Ghorbandoost <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Andrei Andrusenko <[email protected]> Co-authored-by: Greg Clark <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Olivier Delalleau <[email protected]> Co-authored-by: Jason Wang <[email protected]> Co-authored-by: Maanu Grover <[email protected]> Co-authored-by: guyueh1 <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Igor Gitman <[email protected]> Co-authored-by: styagi130 <[email protected]> Co-authored-by: Siddharth Tyagi <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Alireza Morsali <[email protected]> Co-authored-by: styagi130 <[email protected]> Co-authored-by: dorotat-nv <[email protected]> Co-authored-by: Maxime Burchi <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: eharper <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: Kelvin Liu <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Alexander Jipa <[email protected]> Co-authored-by: Alexander Jipa <[email protected]> Co-authored-by: omahs <[email protected]> Co-authored-by: Robin Dong <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Co-authored-by: Samuele Cornell <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: Igor Gitman <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: Tamerlan Tabolov <[email protected]>
- Loading branch information