Skip to content

Commit

Permalink
add renormalize_blend_weights param (#11647)
Browse files Browse the repository at this point in the history
Signed-off-by: dimapihtar <[email protected]>
  • Loading branch information
dimapihtar authored Dec 18, 2024
1 parent b68274b commit 1121289
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,7 @@ model:
shuffle_documents: True # Set to False to disable documents shuffling. Sample index will still be shuffled
exchange_indices_distributed: False # Set to True to exchange indices via torch.distributed instead of filesystem
data_cache_generation_only: False # Set to True to generate only the data cache and stop the training script
renormalize_blend_weights: False # Renormalize the blend weights to account for mid-level dataset oversampling done to ensure fulfillmenet of the of the requested number of samples.

# Nsys profiling options
nsys_profile:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1660,6 +1660,7 @@ def build_train_valid_test_datasets(self):
"mmap_bin_files": self.cfg.data.get("mmap_bin_files", True),
"drop_last_partial_validation_sequence": self.cfg.data.get("validation_drop_last", True),
"num_dataset_builder_threads": self.cfg.data.get("num_dataset_builder_threads", 1),
"renormalize_blend_weights": self.cfg.data.get("renormalize_blend_weights", False),
"add_extra_token_to_sequence": add_extra_token,
}

Expand Down

0 comments on commit 1121289

Please sign in to comment.