-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Racecheck Bug when tl.min used with tl.sum #4736
Comments
same shared memory position. (triton-lang#4736)
same shared memory position. (triton-lang#4736)
I have created a WIP patch lijinpei@3fe20ba which solves the provided script.py and fail no case in |
We likely won't accept your solution even with a unit test. I don't see correctness issues. |
Since having data races in this specific case doesn't cause correctness problems for you IIUC, it might be better to provide your code with real issues. Data races could be triggered by having the same location being accessed by multiple threads with the same value, which is fine in Triton. |
Out of curiosity I profiled the repro before and after the change I do see a small (~1%) speedup that reproduces consistently. |
I think we need to run internal regression benchmarks instead of external ones |
In the above code, I try to find the distance between each element of input with 32 coordinates. And return the coordinate with minimum distance to each input(Might be more easier to understand from the numpy code below). When you run this code with race-check tool of compute-sanitizer using
(compute-sanitizer --tool=racecheck python script.py)
. The following output is shown========= Error: Race reported between Write access at compute_min_distance_coord+0x5ad20 in /usr/local/lib/python3.10/dist-packages/triton/language/standard.py:237
========= and Write access at compute_min_distance_coord+0x5ad20 in /usr/local/lib/python3.10/dist-packages/triton/language/standard.py:237 [6136 hazards]
Error seems to be stemming standard.py which seems to be in the min function
I am not facing correctness issue with this code at the moment. But I have faced correctness issues with other kernels using similar combination tl.sum with tl.min
The text was updated successfully, but these errors were encountered: