-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wilcoxon-Rank-Sum test in rank_genes_groups: suspicious code & ties not corrected #698
Comments
I wrote this code for tie correction; it is inspired by scipy's implementation. def matrix_tiecorrect(rankvals):
size = np.float64(rankvals.shape[1])
if size < 2:
return np.repeat(rankvals.shape[0], 1.0)
arr = np.sort(rankvals, axis=1)
tf = np.insert(arr[:, 1:] != arr[:, :-1], (0, arr.shape[1]-1), True, axis=1)
idx = np.where(tf, np.arange(tf.shape[1]), 0)
idx = np.sort(idx, axis=1)
cnt = np.diff(idx, axis=1).astype(np.float64)
return 1.0 - (cnt**3 - cnt).sum(axis=1) / (size**3 - size) |
Hi @ivirshup, It took us some time to come up with a reproducible example. But then we realized that this behavior only is present in scanpy==1.4 (and perhaps earlier). In 1.4.1 and 1.4.3 crazy Z-scores are gone. The results seem to be correct. Regarding ties. Quickly scanning through the thread I didn't find any mentions of tie correction (maybe I missed something). In any case, please consider the crazy Z-score issue resolved. Tie correction still could be discussed, I think. |
Great, just wanted to make sure that we had that out of the way first. About the tie correction, I'm not the most knowledgeable person about our differential expression testing. Maybe @falexwolf or @a-munoz-rojas would be able to comment on this? @idavydov, what do you think our results should be? Is there a gold standard in scipy.stats which we should be returning the same results as? |
|
Hi! Sorry for the very long delay in replying. Indeed, the version in the code doesn't do tie correction. A while ago when originally implementing some changes to this function, we tried using the |
Hi @a-munoz-rojas, I don't think there is a way to speed-up Regarding ties, this is a simple multiplier. So should be easy to implement or use from I have a matrix version of |
Just brining this thread back up - I think it would be useful to have tie-correction in the code. @falexwolf what do you think? If we agree, @idavydov would you be able submit a pull request to implement it? |
Hi,
Thanks a lot for the library.
We're having some issues with Wilcoxon-Rank-Sum test in
rank_genes_groups
. And I noticed a suspicious code in the implementation:Shouldn't it be
.iloc
?Additionally, it seems there is no tie correction in the code. I think for sparse data correction this could be an issue.
There is an implementation of
tiecorrect
in scipy.Thanks,
Iakov
The text was updated successfully, but these errors were encountered: