Refactor for wgpu v23 compatibility with an example of wgpu device sharing #211

AsherJingkongChen · 2024-10-29T03:24:25Z

Related Issues/PRs

Checklist

Confirmed that cargo xtask test --ci script has been executed.
Made sure the book is up to date with changes in this PR.
Checked for any updates or changes in dependencies.

Changes

1. Modified `entry_point` Type

Change: Updated the type of entry_point from &str to Option<&str>.
Reason: Clippy's suggestion

2. Updated `WgpuDevice::Existing` Enum Type

Change: Altered the type of WgpuDevice::Existing from wgpu::Id<wgpu::Device> to u32.
Reason: After the changes introduced in gfx-rs/wgpu#6134, wgpu::Id no longer exists. Using a u32 ensures a unique identifier for WgpuDevice::Existing.

Note: We avoid using pointers as identifiers due to the possibility of reallocation at the same memory address. To maintain uniqueness, a monotonically increasing value is now used internally.

3. Added feature flags from Wgpu:

Change: See Cargo.toml
Reason: It just compiles.

Examples

Added device_sharing Example
- Description: This example demonstrates the device sharing feature.
- How to Run: Use the following command to run the example:
```
cargo run --example device_sharing --features wgpu
```

ArthurBrussee · 2024-10-29T12:29:38Z

crates/cubecl-wgpu/src/runtime.rs

@@ -96,7 +96,7 @@ pub fn init_existing_device(
    queue: Arc<wgpu::Queue>,
    options: RuntimeOptions,
 ) -> WgpuDevice {
-    let device_id = WgpuDevice::Existing(device.as_ref().global_id());
+    let device_id = WgpuDevice::Existing(ptr::from_ref(device.as_ref()) as usize);


This seems a bit dangerous! No guarantee your next device doesn''t land on the same memory adress - not unlikely even depending how you allocate stuff.

I think maybe just keeping an AtomicU64 which is incremented might be easier. Do have to be careful not to register something twice but that's already the case I think.

(sorry for the driveby comment - I added this code originally)

The code should explain your concerns, it does reallocation on the same address:

No guarantee your next device doesn't land on the same memory address.

fn main() { let mut x = String::from("Hello, world!"); let addr_x1 = &x as *const _ as usize; drop(x); x = String::from("Bonjour, le monde!"); let addr_x2 = &x as *const _ as usize; assert_eq!(addr_x1, addr_x2); }

Well, exactly - two different strings and yet the adress is the same. So, in this case - you could feasible create two different wgpu devices and yet have Cube think the are the same WgpuDevice, that's not good! The id should be unique.

Now I use monotonically increasing value in init_existing_device just like other id generators in CubeCL. Besides, I noticed that I didn't update SPIR-V compiler's init_existing_device.

* Change type of `entry_point` from `&str` to `Option<&str>` * Change enum type of `WgpuDevice::Existing` from `wgpu::Id<wgpu::Device>` to `usize`

* Added auto-increment counter in init_existing_device * Added comments of usage

…/wgpu-v23

* Also updated an argument, see https://docs.rs/wgpu-hal/latest/wgpu_hal/vulkan/struct.Adapter.html#method.device_from_raw

ArthurBrussee · 2024-11-25T19:29:36Z

In theory now that the OOB behaviour has been fixed this should work now, if you have time to re test on metal would be amazing :) Thank you!

AsherJingkongChen · 2024-11-25T19:57:55Z

OOB behaviour has been fixed this should work now

@ArthurBrussee On my macOS (Metal 3), it only fails on spirv tests and a panic from cudarc, but I have Vulkan 1.3.290 (api 1.2) powered by MoltenVK.

Overall, cubecl looks good to my Mac, but I am still testing burn too.

In linux CI, there is a segfault without message, would it come from CUDA? I can't resolve it now.

nathanielsimard · 2024-11-28T13:12:06Z

In linux CI, there is a segfault without message, would it come from CUDA? I can't resolve it now.

I don't think the CI is building CUDA.

nathanielsimard · 2024-11-29T21:20:42Z

@AsherJingkongChen If you merge main, does it fix the CI?

…tor/wgpu-v23

AsherJingkongChen · 2024-11-30T06:58:59Z

crates/cubecl-wgpu/src/compiler/spirv.rs

@@ -280,7 +280,7 @@ fn request_device(
        adapter
            .device_from_raw(
                vk_device,
-                true,
+                None,


I think I am doing the equivalent change here.

AsherJingkongChen mentioned this pull request Oct 29, 2024

Refactor for wgpu v23 compatibility tracel-ai/burn#2435

Draft

4 tasks

ArthurBrussee reviewed Oct 29, 2024

View reviewed changes

AsherJingkongChen changed the title ~~Refactor for wgpu v23 compatibility and add an example~~ Refactor for wgpu v23 compatibility and add an example of wgpu device sharing Oct 29, 2024

AsherJingkongChen changed the title ~~Refactor for wgpu v23 compatibility and add an example of wgpu device sharing~~ Refactor for wgpu v23 compatibility with an example of wgpu device sharing Oct 29, 2024

AsherJingkongChen added 5 commits October 30, 2024 05:25

Refactor for wgpu v23 compatibility

a3c4e33

* Change type of `entry_point` from `&str` to `Option<&str>` * Change enum type of `WgpuDevice::Existing` from `wgpu::Id<wgpu::Device>` to `usize`

Add example of using existing wgpu device

7f689c0

Format

13dd59d

Patch changes for SPIR-V compiler

bfa5032

Make WgpuDevice::Existing unique

dd04e6c

* Added auto-increment counter in init_existing_device * Added comments of usage

AsherJingkongChen force-pushed the refactor/wgpu-v23 branch from fcb6085 to dd04e6c Compare October 29, 2024 22:06

AsherJingkongChen added 10 commits October 30, 2024 06:22

Merge commit '1226222ad8be9ac1701d4b40b4c454ffa6d3ee20' into refactor…

e676ccf

…/wgpu-v23

Adjust namings and comments

3833348

Merge branch 'refactor/cubecl-wgpu/naming' into refactor/wgpu-v23

567e9c3

Adjust example for wgpu runtime updates

c1203d7

Merge branch 'main' into refactor/wgpu-v23

ddffbb6

Update wgpu 23.0.0

c8ff1e1

Fix doc links

038bed6

Merge remote-tracking branch 'origin/main' into refactor/wgpu-v23

57f21ab

Merge commit 'd85d503895cadf773bc25a65ce515aee15d17a33' into refactor…

c6ec950

…/wgpu-v23

Add wgpu spirv required features

ba78b17

* Also updated an argument, see https://docs.rs/wgpu-hal/latest/wgpu_hal/vulkan/struct.Adapter.html#method.device_from_raw

ArthurBrussee mentioned this pull request Nov 22, 2024

Consistent OOB behaviour for wgpu #296

Merged

Merge remote-tracking branch 'origin/main' into refactor/wgpu-v23

e04d19e

Merge remote-tracking branch 'origin/main' into refactor/wgpu-v23

1f13590

Merge branch 'main' of https://github.com/tracel-ai/cubecl into refac…

8636b60

…tor/wgpu-v23

AsherJingkongChen commented Nov 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor for wgpu v23 compatibility with an example of wgpu device sharing #211

Refactor for wgpu v23 compatibility with an example of wgpu device sharing #211

AsherJingkongChen commented Oct 29, 2024 •

edited

Loading

ArthurBrussee Oct 29, 2024

AsherJingkongChen Oct 29, 2024

ArthurBrussee Oct 29, 2024 •

edited

Loading

AsherJingkongChen Oct 29, 2024 •

edited

Loading

ArthurBrussee commented Nov 25, 2024

AsherJingkongChen commented Nov 25, 2024 •

edited

Loading

nathanielsimard commented Nov 28, 2024

nathanielsimard commented Nov 29, 2024

AsherJingkongChen Nov 30, 2024

Refactor for wgpu v23 compatibility with an example of wgpu device sharing #211

Are you sure you want to change the base?

Refactor for wgpu v23 compatibility with an example of wgpu device sharing #211

Conversation

AsherJingkongChen commented Oct 29, 2024 • edited Loading

Related Issues/PRs

Checklist

Changes

1. Modified entry_point Type

2. Updated WgpuDevice::Existing Enum Type

3. Added feature flags from Wgpu:

Examples

ArthurBrussee Oct 29, 2024

Choose a reason for hiding this comment

AsherJingkongChen Oct 29, 2024

Choose a reason for hiding this comment

ArthurBrussee Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

AsherJingkongChen Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

ArthurBrussee commented Nov 25, 2024

AsherJingkongChen commented Nov 25, 2024 • edited Loading

nathanielsimard commented Nov 28, 2024

nathanielsimard commented Nov 29, 2024

AsherJingkongChen Nov 30, 2024

Choose a reason for hiding this comment

AsherJingkongChen commented Oct 29, 2024 •

edited

Loading

1. Modified `entry_point` Type

2. Updated `WgpuDevice::Existing` Enum Type

ArthurBrussee Oct 29, 2024 •

edited

Loading

AsherJingkongChen Oct 29, 2024 •

edited

Loading

AsherJingkongChen commented Nov 25, 2024 •

edited

Loading