Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velocity SIMD CPU Runtime (Runtime + Scalar x2) #1055

Merged
merged 38 commits into from
Sep 27, 2023
Merged

Velocity SIMD CPU Runtime (Runtime + Scalar x2) #1055

merged 38 commits into from
Sep 27, 2023

Conversation

m4rs-mt
Copy link
Owner

@m4rs-mt m4rs-mt commented Aug 24, 2023

This PR adds the announced SIMD-based CPU runtime fully implemented in managed code. The new Velocity accelerator supports most GPU kernels (except those using dynamic shared memory) and is able to utilize SIMD hardware acceleration on modern CPUs, allowing you to run your ILGPU kernels efficiently on then CPU by leveraging the implemented automatic vectorization engine.

It supports the following hardware configurations after merging ALL velocity PRs.

  • Scalar version simulating warps of length 2 (added in this PR)
  • 128bit-based X64 SSE and ARM64 Neon instructions (also supports M1 Macs - Mac M Series Support #769, in progress)
  • 256bit-based X64 AVX instructions (in progress)
  • 512bit-based X64 AVX2 instructions (limited feature set; some functions will fallback to 256bit registers)

Please note that this is the initial PR adding support for building and managing Velocity devices. Furthermore, it also contains the fully-featured code generator to create SIMD-based instructions out of ILGPU IR nodes. However, it does not contain any SIMD code-generation plugins for the backend code generator that will be added later on.

This PR integrates CI-support contributed by @MoFtZ in #1096.

Note that this PR is a new version of PR #891.

This PR depends on #1059, #1061, #1062, #1063, #1064, #1065, #1066, #1067, #1068, #1069, #1070, #1071, #1072, #1073, #1074, #1079, and #1081

Co-authored-by: MoFtZ [email protected]

@m4rs-mt m4rs-mt added the feature A new feature (or feature request) label Aug 24, 2023
@m4rs-mt m4rs-mt added this to the v2.0 milestone Aug 24, 2023
@m4rs-mt m4rs-mt marked this pull request as draft August 24, 2023 19:34
@m4rs-mt m4rs-mt force-pushed the velocity2 branch 9 times, most recently from 2cbb2cd to 93652f7 Compare August 30, 2023 21:23
@m4rs-mt m4rs-mt changed the title Velocity SIMD CPU Runtime Velocity SIMD CPU Runtime (Scalar) Aug 30, 2023
@m4rs-mt m4rs-mt force-pushed the velocity2 branch 2 times, most recently from 9b35027 to f01fe60 Compare August 30, 2023 23:49
@m4rs-mt m4rs-mt marked this pull request as ready for review August 30, 2023 23:49
@m4rs-mt m4rs-mt changed the title Velocity SIMD CPU Runtime (Scalar) Velocity SIMD CPU Runtime (Runtime + Software SIMDx2) Aug 30, 2023
@m4rs-mt m4rs-mt changed the title Velocity SIMD CPU Runtime (Runtime + Software SIMDx2) Velocity SIMD CPU Runtime (Runtime + Scalar x2) Aug 30, 2023
@m4rs-mt m4rs-mt force-pushed the velocity2 branch 10 times, most recently from 3fe39a5 to 86a9007 Compare September 6, 2023 06:58
m4rs-mt added a commit that referenced this pull request Sep 8, 2023
…y generated parameter instances to Velocity kernels.
…d transfer scalar arguments into the vector world.
@m4rs-mt m4rs-mt force-pushed the velocity2 branch 2 times, most recently from 42fed6f to 716d7c3 Compare September 26, 2023 13:47
@m4rs-mt
Copy link
Owner Author

m4rs-mt commented Sep 26, 2023

@MoFtZ I addressed an issue in the ScalarX2 code generator which now passes all tests on Windows/Intel x64.

@m4rs-mt m4rs-mt merged commit 576863b into master Sep 27, 2023
@m4rs-mt m4rs-mt deleted the velocity2 branch September 27, 2023 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature (or feature request)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants