Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onboard MoE #18279

Merged
merged 28 commits into from
Nov 15, 2023
Merged

onboard MoE #18279

merged 28 commits into from
Nov 15, 2023

Conversation

wangyems
Copy link
Contributor

@wangyems wangyems commented Nov 3, 2023

Description

  1. Introduce MoE CUDA op to ORT based on FT implementation.
  2. Upgrade cutlass to 3.1.0 to avoid some build failures on Windows. Remove patch file for cutlass 3.0.0.
  3. Sharded MoE implementation will come with another PR

limitation: CUDA_ARCH >= 700

Motivation and Context

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lintrunner found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

@wangyems wangyems marked this pull request as ready for review November 7, 2023 23:14
@wangyems wangyems requested review from a team as code owners November 7, 2023 23:14
@tianleiwu
Copy link
Contributor

Is it possible to update to cutlass 3.2.2?

docs/ContribOperators.md Outdated Show resolved Hide resolved
@wangyems wangyems requested review from tianleiwu and snnn November 13, 2023 17:43
tianleiwu
tianleiwu previously approved these changes Nov 14, 2023
snnn
snnn previously approved these changes Nov 14, 2023
@wangyems wangyems dismissed stale reviews from snnn and tianleiwu via 3c27fb9 November 14, 2023 21:45
@wangyems wangyems requested a review from snnn November 14, 2023 23:12
Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for admin

@wangyems wangyems merged commit f9af940 into main Nov 15, 2023
89 of 93 checks passed
@wangyems wangyems deleted the wangye/moe_single branch November 15, 2023 00:48
kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
### Description
<!-- Describe your changes. -->
1. Introduce MoE CUDA op to ORT based on FT implementation.
2. Upgrade cutlass to 3.1.0 to avoid some build failures on Windows.
Remove patch file for cutlass 3.0.0.
3. Sharded MoE implementation will come with another PR

limitation: __CUDA_ARCH__ >= 700


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants