Skip to content

Commit

Permalink
moe token drop params
Browse files Browse the repository at this point in the history
  • Loading branch information
Malay Nagda committed Jul 9, 2024
1 parent c631a8a commit 1c3d475
Show file tree
Hide file tree
Showing 4 changed files with 5 additions and 5 deletions.
3 changes: 0 additions & 3 deletions auto_configurator/base_configs/mixtral_3b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,6 @@ exp_manager:
model:
mcore_gpt: true
moe_grouped_gemm: true
moe_token_dispatcher_type: alltoall
moe_pad_expert_input_to_capacity: True
moe_expert_capacity_factor: 1.0
micro_batch_size: 1
global_batch_size: 128
rampup_batch_size: null
Expand Down
2 changes: 0 additions & 2 deletions auto_configurator/base_configs/mixtral_7b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,6 @@ model:
mcore_gpt: true
moe_grouped_gemm: true
moe_token_dispatcher_type: alltoall
moe_pad_expert_input_to_capacity: True
moe_expert_capacity_factor: 1.0
moe_aux_loss_coeff: 0.01
micro_batch_size: 1
global_batch_size: 256
Expand Down
3 changes: 3 additions & 0 deletions launcher_scripts/conf/training/mixtral/mixtral_8x3b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ exp_manager:
model:
mcore_gpt: true
moe_grouped_gemm: true
moe_token_dispatcher_type: alltoall
moe_pad_expert_input_to_capacity: True
moe_expert_capacity_factor: 1.0
micro_batch_size: 1
global_batch_size: 128
rampup_batch_size: null
Expand Down
2 changes: 2 additions & 0 deletions launcher_scripts/conf/training/mixtral/mixtral_8x7b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ model:
mcore_gpt: true
moe_grouped_gemm: true
moe_token_dispatcher_type: alltoall
moe_pad_expert_input_to_capacity: True
moe_expert_capacity_factor: 1.0
moe_aux_loss_coeff: 0.01
micro_batch_size: 1
global_batch_size: 256
Expand Down

0 comments on commit 1c3d475

Please sign in to comment.