New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Implemenation of IObinding in Mixtral MoE Parity Script #21153

Open

t-khaspear wants to merge 15 commits into main from khalias/a

t-khaspear commented Jun 24, 2024

Motivation and Context

These changes were done to effectively use iobinding to mimic the results of kernel latencies with the MoE mixtral model. Now, benchmarking is available for the mixtral model through this parity script.

Your Name and others added 6 commits

June 19, 2024 22:13


          added iobinding to moe mixtral benchmarking kernel

41d86e9


          Update test_parity_mixtral_moe.py

a7b5bf8


          Implemented iobinding for mixtral benchmarking

1e68394


          deleted an unnecessary line of code

78b9616


          deleted unnecessary code

c0b78c2


          Deleted unnecessary code

fcf07cf

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Fixed Show fixed Hide fixed


          deleted white spaces in blank lines

aab9242

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Fixed Show fixed Hide fixed

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Show resolved Hide resolved

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Show resolved Hide resolved


          added two tests: one for large cases and one for benchmarking cases

7f7472b

deleted the moe onnx model once it is done being used

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Fixed Show fixed Hide fixed


          Created a new function to save the model and to to recognize the exte…

ad45401

…rnal data

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py

               import unittest
               from collections import OrderedDict
               import numpy
+              import onnx

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note test

Module 'onnx' is imported with both 'import' and 'import from'.
Module 'onnxruntime.test.onnx' is imported with both 'import' and 'import from'.

wangyems previously approved these changes

View reviewed changes

Contributor

wangyems left a comment

LGTM


          Renamed the new function and added a new one to delete the model and …

2de42ed

…the model data

wangyems dismissed their stale review via

2de42ed

June 27, 2024 21:31

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_mixtral_moe.py Outdated

@@ @@ -38,6 +42,18 @@ def print_tensor(name, numpy_array): @@
                   print(f"const std::vector<float> {name} = {value_string_of(numpy_array)};")
+              def save_model_to_disk(model, model_path):
+                  external_data_path = "mixtral_moe.onnx" + ".data"

Contributor

wangyems Jun 28, 2024

nit: external_data_path = model_path + ".data"

wangyems previously approved these changes

View reviewed changes


          Changed "mixtral_moe.onn" to model_path

3183dde

wangyems dismissed their stale review via

3183dde

June 28, 2024 21:53


          DBRX parity script

67827a0

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Outdated

+                      w1_chunked = [w1.squeeze(dim=0) for w1 in w1_chunked]
+                      v1_chunked = [v1.squeeze(dim=0) for v1 in v1_chunked]
+                      w2_chunked = [w2.squeeze(dim=0) for w2 in w2_chunked]
+                      for expert_idx in range(0, self.moe_num_experts):

Check warning

Code scanning / lintrunner

RUFF/PIE808 Warning test

Unnecessary start argument in range.
See https://docs.astral.sh/ruff/rules/unnecessary-range-start

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

		@@ -0,0 +1,461 @@
		# --------------------------------------------------------------------------

Check warning

Code scanning / lintrunner

RUFF/format Warning test

Run lintrunner -a to apply this patch.

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

		@@ -0,0 +1,461 @@
		# --------------------------------------------------------------------------

Check warning

Code scanning / lintrunner

BLACK-ISORT/format Warning test

Run lintrunner -a to apply this patch.

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed


          sharding implementation

216dcd7

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

+              import numpy
+              import torch
+              import torch.nn.functional as F

Check warning

Code scanning / lintrunner

RUFF/F401 Warning test

torch.nn.functional imported but unused.
See https://docs.astral.sh/ruff/rules/unused-import

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

Comment on lines +330 to +340

+                      onnx_model_local = create_moe_onnx_graph(
+                          num_rows,
+                          num_experts,
+                          num_experts,
+                          hidden_size,
+                          inter_size // get_size(),
+                          fc1_experts_weights,
+                          fc2_experts_weights,
+                          fc3_experts_weights,
+                          tensor_shards=get_size(),
+                      )

Check failure

Code scanning / CodeQL

Wrong name for an argument in a call Error test

Keyword argument 'tensor_shards' is not a supported parameter name of

function create_moe_onnx_graph

.

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

+              import numpy
+              import torch
+              import torch.nn.functional as F

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'F' is not used.

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Fixed Show fixed Hide fixed

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

		from typing import Tuple


		import onnxruntime

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note test

Module 'onnxruntime' is imported with both 'import' and 'import from'.

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py



		import onnxruntime
		import onnx

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note test

Module 'onnx' is imported with both 'import' and 'import from'.
Module 'onnxruntime.test.onnx' is imported with both 'import' and 'import from'.

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

		self.ort_sess = self.create_ort_session()


		def test_moe_with_tensor_parallelism(

Contributor

wangyems Jul 10, 2024 •

edited

Loading

ORT moe op's tensor parallelism is tested so we do not need to test again here. let's just keep this script for testing single GPU

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Outdated

+                      self.moe_num_experts = config.num_local_experts
+                      ffn_act_fn = {"name": config.hidden_act}
+                      self.w1 = nn.Parameter(torch.empty(moe_num_experts, moe_num_experts * ffn_hidden_size, hidden_size))

Contributor

wangyems Jul 10, 2024 •

edited

Loading

the huggingface implementation https://github.com/huggingface/transformers/blob/c54af4c77ed5d72ddcb79d0cc4804d97f21deabc/src/transformers/models/dbrx/modeling_dbrx.py#L738

        self.w1 = nn.Parameter(torch.empty(moe_num_experts * ffn_hidden_size, hidden_size))
        self.v1 = nn.Parameter(torch.empty(moe_num_experts * ffn_hidden_size, hidden_size))
        self.w2 = nn.Parameter(torch.empty(moe_num_experts * ffn_hidden_size, hidden_size))

let's not change the implementation

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Outdated

Comment on lines 260 to 269

+                      w1_list = []
+                      v1_list = []
+                      w2_list = []
+                      for i in range(self.moe_num_experts):
+                          w1_list.append(self.mlp.w1[i])
+                          v1_list.append(self.mlp.v1[i])
+                          w2_list.append(self.mlp.w2[i])
+                      self.moe_experts_weight1 = torch.stack(w1_list, dim=0)
+                      self.moe_experts_weight2 = torch.stack(v1_list, dim=0)
+                      self.moe_experts_weight3 = torch.stack(w2_list, dim=0)

Contributor

wangyems Jul 10, 2024

these are not needed

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Outdated

+                          self.moe_num_experts,
+                          self.hidden_size,
+                          self.ffn_hidden_size,
+                          self.moe_experts_weight1,

Contributor

wangyems Jul 10, 2024

pass self.mlp.w1/w2/v1 directly since they are defined with shape [num_experts, ...]
this is the part that's different from mixtral
you probably need to transpose one of them to make it align with ORT format

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Outdated

Comment on lines 99 to 101

+                  fc1_shape = [num_experts, num_experts * inter_size, hidden_size]
+                  fc2_shape = [num_experts, num_experts * inter_size, hidden_size]
+                  fc3_shape = [num_experts, num_experts * inter_size, hidden_size]

Contributor

wangyems Jul 10, 2024

let's keep it same as mixtral's

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Outdated




		class DbrxRouter(nn.Module):

Contributor

wangyems Jul 10, 2024

move this class to just after DBRXconfig

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Outdated

+                                                    batch_size,
+                                                    sequence_length,
+                                                    config)
+                              dbrx_moe.test_moe_with_tensor_parallelism(hidden_size,

Contributor

wangyems Jul 10, 2024

only test single GPU here

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Outdated

		return out


		def ort_forward(self, hidden_states: torch.Tensor, iobinding=False) -> torch.Tensor:

Contributor

wangyems Jul 10, 2024 •

edited

Loading

let's implement ort_forward() in class DbrxFFN since ORT MoE contains topk&softmax (part of DbrxRouter)


          script changes

842001b

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

+              from collections import OrderedDict
+              import numpy
+              import os

Check warning

Code scanning / lintrunner

RUFF/F401 Warning test

os imported but unused.
See https://docs.astral.sh/ruff/rules/unused-import

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

+                  def parity_check(self):
+                      config = DBRXConfig()
+                      ffn = DbrxFFN(config, self.batch_size, self.sequence_length)
+                      router = DbrxRouter(hidden_size=config.hidden_size,

Check warning

Code scanning / lintrunner

RUFF/F841 Warning test

Local variable router is assigned to but never used.
See https://docs.astral.sh/ruff/rules/unused-variable

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

+                      hidden_state = torch.randn(self.batch_size, self.sequence_length, self.hidden_size)
+                      torch_output = ffn.forward(hidden_state)
+                      print("forward: ", torch_output)
+                      ort_output = ffn.ort_forward(hidden_state, iobinding=False)

Check warning

Code scanning / lintrunner

RUFF/F841 Warning test

Local variable ort\_output is assigned to but never used.
See https://docs.astral.sh/ruff/rules/unused-variable

github-advanced-security bot found potential problems

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

+              from collections import OrderedDict
+              import numpy
+              import os

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'os' is not used.

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

Comment on lines +64 to +66

+              #def delete_model_data(external_data):
+                  #os.remove("dbrx_moe.onnx")
+                  #os.remove(external_data)

Check notice

Code scanning / CodeQL

Commented-out code Note test

This comment appears to contain commented-out code.

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

+                  def parity_check(self):
+                      config = DBRXConfig()
+                      ffn = DbrxFFN(config, self.batch_size, self.sequence_length)
+                      router = DbrxRouter(hidden_size=config.hidden_size,

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable router is not used.

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

+                      hidden_state = torch.randn(self.batch_size, self.sequence_length, self.hidden_size)
+                      torch_output = ffn.forward(hidden_state)
+                      print("forward: ", torch_output)
+                      ort_output = ffn.ort_forward(hidden_state, iobinding=False)

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable ort_output is not used.

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Outdated

Comment on lines 313 to 315

+                          self.mlp.w1,
+                          self.mlp.v1,
+                          self.mlp.w2,

Contributor

wangyems Jul 12, 2024

order should be w1, w2, v1 and with certain transpose operations

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py

+                          ["output"],
+                          "MoE_0",
+                          k=topk,
+                          normalize_routing_weights=1,

Contributor

wangyems Jul 12, 2024

I think this should be 0

wangyems reviewed

View reviewed changes

onnxruntime/test/python/transformers/test_parity_dbrx_moe.py Outdated

Comment on lines 107 to 109

+                  fc1_experts_weights = fc1_experts_weights.view(16, 6144, 10752)
+                  fc2_experts_weights = fc2_experts_weights.view(16, 6144, 10752).transpose(1, 2)
+                  fc3_experts_weights = fc3_experts_weights.view(16, 6144, 10752)

Contributor

wangyems Jul 12, 2024

it's recommended to do view() and transpose() outside of this function


          reshaped tesors for parity to match

8d41aae

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet