-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor for wgpu v23 compatibility with an example of wgpu device sharing #211
base: main
Are you sure you want to change the base?
Refactor for wgpu v23 compatibility with an example of wgpu device sharing #211
Conversation
crates/cubecl-wgpu/src/runtime.rs
Outdated
@@ -96,7 +96,7 @@ pub fn init_existing_device( | |||
queue: Arc<wgpu::Queue>, | |||
options: RuntimeOptions, | |||
) -> WgpuDevice { | |||
let device_id = WgpuDevice::Existing(device.as_ref().global_id()); | |||
let device_id = WgpuDevice::Existing(ptr::from_ref(device.as_ref()) as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit dangerous! No guarantee your next device doesn''t land on the same memory adress - not unlikely even depending how you allocate stuff.
I think maybe just keeping an AtomicU64 which is incremented might be easier. Do have to be careful not to register something twice but that's already the case I think.
(sorry for the driveby comment - I added this code originally)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code should explain your concerns, it does reallocation on the same address:
No guarantee your next device doesn't land on the same memory address.
fn main() {
let mut x = String::from("Hello, world!");
let addr_x1 = &x as *const _ as usize;
drop(x);
x = String::from("Bonjour, le monde!");
let addr_x2 = &x as *const _ as usize;
assert_eq!(addr_x1, addr_x2);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, exactly - two different strings and yet the adress is the same. So, in this case - you could feasible create two different wgpu devices and yet have Cube think the are the same WgpuDevice, that's not good! The id should be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I use monotonically increasing value in init_existing_device
just like other id generators in CubeCL. Besides, I noticed that I didn't update SPIR-V compiler's init_existing_device
.
* Change type of `entry_point` from `&str` to `Option<&str>` * Change enum type of `WgpuDevice::Existing` from `wgpu::Id<wgpu::Device>` to `usize`
* Added auto-increment counter in init_existing_device * Added comments of usage
fcb6085
to
dd04e6c
Compare
* Also updated an argument, see https://docs.rs/wgpu-hal/latest/wgpu_hal/vulkan/struct.Adapter.html#method.device_from_raw
In theory now that the OOB behaviour has been fixed this should work now, if you have time to re test on metal would be amazing :) Thank you! |
@ArthurBrussee On my macOS (Metal 3), it only fails on spirv tests and a panic from Overall, cubecl looks good to my Mac, but I am still testing In linux CI, there is a segfault without message, would it come from CUDA? I can't resolve it now. |
I don't think the CI is building CUDA. |
@AsherJingkongChen If you merge main, does it fix the CI? |
@@ -280,7 +280,7 @@ fn request_device( | |||
adapter | |||
.device_from_raw( | |||
vk_device, | |||
true, | |||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am doing the equivalent change here.
Related Issues/PRs
Checklist
cargo xtask test --ci
script has been executed.Changes
1. Modified
entry_point
Typeentry_point
from&str
toOption<&str>
.2. Updated
WgpuDevice::Existing
Enum TypeWgpuDevice::Existing
fromwgpu::Id<wgpu::Device>
tou32
.wgpu::Id
no longer exists. Using au32
ensures a unique identifier forWgpuDevice::Existing
.3. Added feature flags from Wgpu:
Cargo.toml
Examples
device_sharing
Example