You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This proposal introduces a fully asynchronous semantic for the backend and its operators.
For the Warp and Neon backends, the execution of an operator is asynchronous (the calling function returns before the kernel completes). However, this asynchronous behavior is not fully abstracted by the XLB backend. Currently, synchronization calls are added directly to the code using Warp or Neon mechanisms where needed.
This proposal aims to discuss how to implement a comprehensive synchronization semantic; whether it will be visible to the XLB user depends on the chosen approach.
So far, two cases have been considered:
Case A: Directly abstracting synchronization into the backend or operator API. In this approach, the synchronization abstraction should manage:
A default stream
A synchronization method
Case B: Enforcing synchronous behavior directly in any CPU operation that accesses XLB fields. This solution would require:
A default stream management
Injecting synchronization into any operation that allows the user to access field data
The proposal aims to address situations where CPU computation may access data still in use by the GPU. For instance, in this example, a buffer could be deleted before the kernel has completed.
In Case A, we would require an explicit XLB sync call, like xlb.sync(), at line 23. In Case B, the synchronization would be included directly in the field destructor.
The text was updated successfully, but these errors were encountered:
This proposal introduces a fully asynchronous semantic for the backend and its operators.
For the Warp and Neon backends, the execution of an operator is asynchronous (the calling function returns before the kernel completes). However, this asynchronous behavior is not fully abstracted by the XLB backend. Currently, synchronization calls are added directly to the code using Warp or Neon mechanisms where needed.
This proposal aims to discuss how to implement a comprehensive synchronization semantic; whether it will be visible to the XLB user depends on the chosen approach.
So far, two cases have been considered:
Case A: Directly abstracting synchronization into the backend or operator API. In this approach, the synchronization abstraction should manage:
Case B: Enforcing synchronous behavior directly in any CPU operation that accesses XLB fields. This solution would require:
The proposal aims to address situations where CPU computation may access data still in use by the GPU. For instance, in this example, a buffer could be deleted before the kernel has completed.
In Case A, we would require an explicit XLB sync call, like
xlb.sync()
, at line 23. In Case B, the synchronization would be included directly in the field destructor.The text was updated successfully, but these errors were encountered: