Skip to content

Commit

Permalink
update configs for mixtral
Browse files Browse the repository at this point in the history
Signed-off-by: Yu Chin Fabian Lim <[email protected]>
  • Loading branch information
fabianlim committed Nov 11, 2024
1 parent 4db0982 commit 354513a
Show file tree
Hide file tree
Showing 7 changed files with 54 additions and 33 deletions.
2 changes: 2 additions & 0 deletions plugins/accelerated-moe/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,12 @@ Notes on code extraction:

Run the below in the top-level directory of this repo:
- the `scattermoe` dep is not included by default, so the `-x` switch installs it.
- consider disabling the `torch` memory logging to see improved speeds.

```
tox -e run-benches \
-x testenv:run-benches.deps+="-r plugins/accelerated-moe/requirements-khd.txt" \
-x testenv:run-benches.setenv+="MEMORY_LOGGING=nividia" \
-- \
"1 2 4" 128 benchmark_outputs scenarios-moe.yaml accelerated-moe-scatter
```
Expand Down
6 changes: 3 additions & 3 deletions sample-configurations/CONTENTS.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,8 @@ framework_configs:
- accelerated-moe
filename: moe-scattermoe-granite-ep8-sample-configuration.yaml

- shortname: moe-scattermoe-granite-ep8-padding-free
- shortname: moe-scattermoe-granite-ep8-foak
plugins:
- accelerated-moe
- attention-and-distributed-packing
filename: moe-scattermoe-granite-ep8-padding-free-sample-configuration.yaml
- fused-ops-and-kernels
filename: moe-scattermoe-granite-ep8-foak-sample-configuration.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# FMS Acceleration Plugin Configuration.
#
# Each stanza incorporates various configurations for
# different fine-tuning / training tasks.
plugins:
training:

fused_ops_and_kernels:

# if under training stanza, then putting
# base_layer and fused_lora will be a misnomer
# - this should be in peft.quantized
# However, if it is specified, it will still
# be read. This is useful in use cases where
# the yaml is system generated and not shown
# to a user.

# activate various unsloth optimizations
# there are two versions of the plugin
# - the FastKernel version supports individual kernels
# - the FastQuantized version is all-or-nothing

# fast loss triton kernels
fast_loss: true

# fast rms norm triton kernels
fast_rms_layernorm: true

# fast RoPE embedding triton kernels
fast_rope_embeddings: true
moe:

# expert-parallel for MoE
scattermoe:

# The level of expert parallel sharding.
# - 1 means no sharding
# - if > 1, please ensure that this divides the world_size. This is because
# the devices will be replicated for every ep_degree devices, and
# the experts will be sharded within each group.
# - if > 1, also ensure that it divides the number of experts, as each device
# will then have num_of_experts / ep_degree experts.
ep_degree: 8

This file was deleted.

1 change: 1 addition & 0 deletions scripts/benchmarks/scenarios-moe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ scenarios:
framework_config:
- # without acceleration
- moe-scattermoe-granite-ep8
- moe-scattermoe-granite-ep8-foak
slow: True
arguments:
learning_rate: 5e-5
Expand Down
2 changes: 1 addition & 1 deletion scripts/generate_sample_configurations.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ def read_configuration(path: str) -> Dict:
("moe-scattermoe-granite-ep4", (KEY_SCATTERMOE_EP4,)),
("moe-scattermoe-granite-ep4-padding-free", (KEY_AADP_PADDING_FREE, KEY_SCATTERMOE_EP4,)),
("moe-scattermoe-granite-ep8", (KEY_SCATTERMOE_EP8,)),
("moe-scattermoe-granite-ep8-padding-free", (KEY_AADP_PADDING_FREE, KEY_SCATTERMOE_EP8,)),
("moe-scattermoe-granite-ep8-foak", (KEY_FAST_KERNELS, KEY_SCATTERMOE_EP8,)),
]


Expand Down
5 changes: 4 additions & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ commands =
# need a version of fms-hf-tuning that has integrated the framework
# NOTE: have to install this first coz havnt merged
# - this repo has a lot of pins, so we just install it first
pip install "fms-hf-tuning[flash-attn] @ git+https://github.com/foundation-model-stack/fms-hf-tuning.git@"{env:FHT_BRANCH:main}
pip install "fms-hf-tuning @ git+https://github.com/foundation-model-stack/fms-hf-tuning.git@"{env:FHT_BRANCH:main}

# some models need this for tokenizers
pip install protobuf
Expand All @@ -41,6 +41,9 @@ commands =
python -m fms_acceleration.cli install -e {toxinidir}/plugins/attention-and-distributed-packing
python -m fms_acceleration.cli install -e {toxinidir}/plugins/accelerated-moe

# install the flash attn at the last
pip install flash-attn

# run the benchmark script
bash scripts/run_benchmarks.sh {posargs:"1 2" "4 8" benchmark_outputs}

Expand Down

0 comments on commit 354513a

Please sign in to comment.