Wilcoxon-Rank-Sum test in rank_genes_groups: suspicious code & ties not corrected #698

idavydov · 2019-06-19T15:21:17Z

Hi,
Thanks a lot for the library.

We're having some issues with Wilcoxon-Rank-Sum test in rank_genes_groups. And I noticed a suspicious code in the implementation:

scores[left:right] = np.sum(ranks.loc[0:n_active, :])

Shouldn't it be .iloc?

Additionally, it seems there is no tie correction in the code. I think for sparse data correction this could be an issue.

There is an implementation of tiecorrect in scipy.

Thanks,
Iakov

The text was updated successfully, but these errors were encountered:

idavydov · 2019-06-19T15:34:16Z

I wrote this code for tie correction; it is inspired by scipy's implementation.

def matrix_tiecorrect(rankvals):
    size = np.float64(rankvals.shape[1])
    if size < 2:
        return np.repeat(rankvals.shape[0], 1.0)

    arr = np.sort(rankvals, axis=1)
    tf = np.insert(arr[:, 1:] != arr[:, :-1], (0, arr.shape[1]-1), True, axis=1)
    idx = np.where(tf, np.arange(tf.shape[1]), 0)
    idx = np.sort(idx, axis=1)
    cnt = np.diff(idx, axis=1).astype(np.float64)

    return 1.0 - (cnt**3 - cnt).sum(axis=1) / (size**3 - size)

ivirshup · 2019-07-05T07:22:07Z

@idavydov, I think .loc and .iloc would be equivalent in this case, since the index should just be sorted integers. Do you have a case where changing it makes a difference?

I'm not too sure about why we don't do tie correction. I think there was some discussion of that here: #460

idavydov · 2019-07-17T14:06:55Z

Hi @ivirshup,

It took us some time to come up with a reproducible example. But then we realized that this behavior only is present in scanpy==1.4 (and perhaps earlier). In 1.4.1 and 1.4.3 crazy Z-scores are gone. The results seem to be correct.

Regarding ties. Quickly scanning through the thread I didn't find any mentions of tie correction (maybe I missed something).

In any case, please consider the crazy Z-score issue resolved. Tie correction still could be discussed, I think.

ivirshup · 2019-07-20T07:41:11Z

Great, just wanted to make sure that we had that out of the way first.

About the tie correction, I'm not the most knowledgeable person about our differential expression testing. Maybe @falexwolf or @a-munoz-rojas would be able to comment on this?

@idavydov, what do you think our results should be? Is there a gold standard in scipy.stats which we should be returning the same results as?

idavydov · 2019-07-22T08:18:49Z

scipy.stats.mannwhitneyu uses tie correction (and there is no way to disable it). I think the better option would be to make it enabled by default and have an option to disable it.

a-munoz-rojas · 2019-09-05T18:22:08Z

Hi! Sorry for the very long delay in replying. Indeed, the version in the code doesn't do tie correction. A while ago when originally implementing some changes to this function, we tried using the scipy.stats.mannwhitneyumethod, but it was significantly slower so we kept the current version instead. If there is a way to improve the performance of scipy version, it might be worth trying

idavydov · 2019-09-06T09:42:33Z

Hi @a-munoz-rojas,

I don't think there is a way to speed-up scipy.stats.mannwhitney, as it expects 1d vectors; not a matrix.

Regarding ties, this is a simple multiplier. So should be easy to implement or use from scipy.stats.

I have a matrix version of scipy.stats.mannwhitney and scipy.stats.tiecorrect which is almost a 1-to-1 rewrite. I can share it in case you are interested.

a-munoz-rojas · 2019-12-04T23:58:59Z

Just brining this thread back up - I think it would be useful to have tie-correction in the code. @falexwolf what do you think? If we agree, @idavydov would you be able submit a pull request to implement it?

idavydov changed the title ~~Wilcoxon-Rank-Sum test in rank_genes_groups: suspicious code & tie not corrected~~ Wilcoxon-Rank-Sum test in rank_genes_groups: suspicious code & ties not corrected Jun 19, 2019

zhangguy mentioned this issue Jul 29, 2019

different pvals when doing wilcoxon rank_genes_groups for treatment vs. ctrl and ctrl vs. treatment #754

Closed

ivirshup added the Area – Differential Expression Differential expression label Aug 2, 2019

Koncopd mentioned this issue Jul 23, 2020

Add tie correction to wilcoxon method in rank_genes_groups #1330

Merged

Koncopd closed this as completed Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wilcoxon-Rank-Sum test in rank_genes_groups: suspicious code & ties not corrected #698

Wilcoxon-Rank-Sum test in rank_genes_groups: suspicious code & ties not corrected #698

idavydov commented Jun 19, 2019

idavydov commented Jun 19, 2019 •

edited

Loading

ivirshup commented Jul 5, 2019

idavydov commented Jul 17, 2019

ivirshup commented Jul 20, 2019

idavydov commented Jul 22, 2019

a-munoz-rojas commented Sep 5, 2019

idavydov commented Sep 6, 2019

a-munoz-rojas commented Dec 4, 2019

Wilcoxon-Rank-Sum test in rank_genes_groups: suspicious code & ties not corrected #698

Wilcoxon-Rank-Sum test in rank_genes_groups: suspicious code & ties not corrected #698

Comments

idavydov commented Jun 19, 2019

idavydov commented Jun 19, 2019 • edited Loading

ivirshup commented Jul 5, 2019

idavydov commented Jul 17, 2019

ivirshup commented Jul 20, 2019

idavydov commented Jul 22, 2019

a-munoz-rojas commented Sep 5, 2019

idavydov commented Sep 6, 2019

a-munoz-rojas commented Dec 4, 2019

idavydov commented Jun 19, 2019 •

edited

Loading