Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor for wgpu v23 compatibility with an example of wgpu device sharing #211

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

AsherJingkongChen
Copy link
Contributor

@AsherJingkongChen AsherJingkongChen commented Oct 29, 2024

Related Issues/PRs

Checklist

  • Confirmed that cargo xtask test --ci script has been executed.
  • Made sure the book is up to date with changes in this PR.
  • Checked for any updates or changes in dependencies.

Changes

1. Modified entry_point Type

  • Change: Updated the type of entry_point from &str to Option<&str>.
  • Reason: Clippy's suggestion

2. Updated WgpuDevice::Existing Enum Type

  • Change: Altered the type of WgpuDevice::Existing from wgpu::Id<wgpu::Device> to u32.
  • Reason: After the changes introduced in gfx-rs/wgpu#6134, wgpu::Id no longer exists. Using a u32 ensures a unique identifier for WgpuDevice::Existing.

Note: We avoid using pointers as identifiers due to the possibility of reallocation at the same memory address. To maintain uniqueness, a monotonically increasing value is now used internally.

3. Added feature flags from Wgpu:

  • Change: See Cargo.toml
  • Reason: It just compiles.

Examples

  1. Added device_sharing Example
    • Description: This example demonstrates the device sharing feature.
    • How to Run: Use the following command to run the example:
      cargo run --example device_sharing --features wgpu

@@ -96,7 +96,7 @@ pub fn init_existing_device(
queue: Arc<wgpu::Queue>,
options: RuntimeOptions,
) -> WgpuDevice {
let device_id = WgpuDevice::Existing(device.as_ref().global_id());
let device_id = WgpuDevice::Existing(ptr::from_ref(device.as_ref()) as usize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit dangerous! No guarantee your next device doesn''t land on the same memory adress - not unlikely even depending how you allocate stuff.

I think maybe just keeping an AtomicU64 which is incremented might be easier. Do have to be careful not to register something twice but that's already the case I think.

(sorry for the driveby comment - I added this code originally)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code should explain your concerns, it does reallocation on the same address:

No guarantee your next device doesn't land on the same memory address.

fn main() {
    let mut x = String::from("Hello, world!");
    let addr_x1 = &x as *const _ as usize;
    drop(x);
    x = String::from("Bonjour, le monde!");
    let addr_x2 = &x as *const _ as usize;
    assert_eq!(addr_x1, addr_x2);
}

Copy link
Contributor

@ArthurBrussee ArthurBrussee Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, exactly - two different strings and yet the adress is the same. So, in this case - you could feasible create two different wgpu devices and yet have Cube think the are the same WgpuDevice, that's not good! The id should be unique.

Copy link
Contributor Author

@AsherJingkongChen AsherJingkongChen Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I use monotonically increasing value in init_existing_device just like other id generators in CubeCL. Besides, I noticed that I didn't update SPIR-V compiler's init_existing_device.

@AsherJingkongChen AsherJingkongChen changed the title Refactor for wgpu v23 compatibility and add an example Refactor for wgpu v23 compatibility and add an example of wgpu device sharing Oct 29, 2024
@AsherJingkongChen AsherJingkongChen changed the title Refactor for wgpu v23 compatibility and add an example of wgpu device sharing Refactor for wgpu v23 compatibility with an example of wgpu device sharing Oct 29, 2024
* Change type of `entry_point` from `&str` to `Option<&str>`
* Change enum type of `WgpuDevice::Existing` from `wgpu::Id<wgpu::Device>` to `usize`
* Added auto-increment counter in init_existing_device
* Added comments of usage
@ArthurBrussee
Copy link
Contributor

In theory now that the OOB behaviour has been fixed this should work now, if you have time to re test on metal would be amazing :) Thank you!

@AsherJingkongChen
Copy link
Contributor Author

AsherJingkongChen commented Nov 25, 2024

OOB behaviour has been fixed this should work now

@ArthurBrussee On my macOS (Metal 3), it only fails on spirv tests and a panic from cudarc, but I have Vulkan 1.3.290 (api 1.2) powered by MoltenVK.

Overall, cubecl looks good to my Mac, but I am still testing burn too.

In linux CI, there is a segfault without message, would it come from CUDA? I can't resolve it now.

@nathanielsimard
Copy link
Member

In linux CI, there is a segfault without message, would it come from CUDA? I can't resolve it now.

I don't think the CI is building CUDA.

@nathanielsimard
Copy link
Member

@AsherJingkongChen If you merge main, does it fix the CI?

@@ -280,7 +280,7 @@ fn request_device(
adapter
.device_from_raw(
vk_device,
true,
None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am doing the equivalent change here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants