Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default 'ties.method' differs between rowRanks() and rank() #142

Open
pneuvial opened this issue Mar 26, 2019 · 18 comments
Open

Default 'ties.method' differs between rowRanks() and rank() #142

pneuvial opened this issue Mar 26, 2019 · 18 comments
Milestone

Comments

@pneuvial
Copy link

By default rank() uses the 'average' method to break ties, while rowRanks() uses the 'max' method:

> args(rank)
function (x, na.last = TRUE, ties.method = c("average", "first",
    "last", "random", "max", "min"))
> args(matrixStats::rowRanks)
function (x, rows = NULL, cols = NULL, ties.method = c("max",
    "average", "min"), dim. = dim(x), ...)

Shouldn't the default argument be the same in both functions?

@HenrikBengtsson
Copy link
Owner

Good point. Hmm... could be a mistake or it could be that rank() has been updated since rowRanks() was introduced.

@HenrikBengtsson
Copy link
Owner

HenrikBengtsson commented Mar 26, 2019

Nah... the reason is probably that it was only ties.method = "max" that was supported in the early days:

Version: 0.7.0 [2013-01-14]

NEW FEATURES:

  • Now (col|row)Ranks() support "max" (default), "min" and "average" for argument 'ties.method'. Added system tests validation these cases. Thanks Peter Langfelder (UCLA) for contributing this.

Version: 0.6.4 [2013-01-13]

NEW FEATURES:

  • Added argument 'ties.method' to rowRanks() and colRanks(), but still only support for "max" (as before).

Version: 0.4.0 [2011-11-11]

[...]

NEW FEATURES:

  • Added rowRanks() and colRanks(). Thanks Hector Corrada Bravo (University of Maryland) and Harris Jaffee (John Hopkins).

@pneuvial
Copy link
Author

In any case I'm glad that ties.method = "average" is also supported: it's the one used in wilcox.test(), so I've been able to make a rowWilcoxonTests() function using rowRanks(ties.method = "average").

@HenrikBengtsson
Copy link
Owner

Cool. Is your rowWilcoxonTests() available online?

@karoliskoncevicius
Copy link

Subscribed to notifications of matrixStats and found this. Just wanted to add that I also used rowRanks a few times when implementing very similar things - row_kruskalwallis()

So this is mainly a praise of matrixStats - it is extremely useful for speeding up multiple statistical tests on genomic-like datasets.

And by coincidence "row_wilcoxon" is next on the list of things that will get included in that package.

@pneuvial
Copy link
Author

Interesting! I did not know about the matrixTests package, it's very nice!

My rowWilcoxonTests() is available in the sansSouci package: https://github.com/pneuvial/sanssouci/blob/develop/R/rowWilcoxonTests.R

Of course it would make more sense to have this or a version of it in a package specifically dedicated to tests on the rows of a matrix!

NB: as noted in the Details of the man page:

The p-values are computed using the normal approximation as
described in the \code{\link{wilcox.test}} function. The exact p-values
(which can be useful for small samples with no ties) are not implemented
(yet).

FYI Karolis: I also have rowWelchTests() and rowBinomialTests() in the same package.

@karoliskoncevicius
Copy link

Thank you a lot for the sharing the reference to your package Pierre! Will definitely take a look.

From time to time I search for packages that implement similar things and everything I find I add to the references in the README under "See Also" section. Added yours now too. I probably missed it because it wasn't on CRAN nor BioConductor.

Also think yours is the first time I see Wilcoxon Test implemented in such a way. Most others typically implement fast versions of t-tests and are done.

The p-values are computed using the normal approximation as
described in the \code{\link{wilcox.test}} function. The exact p-values
(which can be useful for small samples with no ties) are not implemented
(yet).

Yes, this is the main thing that held me back from implementing Wilcoxon test in matrixTests. I try to maintain high fidelity with tests that are already in R, so all these approximations and when they are used sometimes takes time to get "right". Especially when they depend on sample size and there are different number of missing NA values in different rows.

@bkmontgom
Copy link
Contributor

I submitted a PR that addresses this and adds "first", "last", "random", and "dense" ties.methods.

@HenrikBengtsson
Copy link
Owner

Thank again for adding this (PR #146).

We are going to need several release cycles to change the default for argument ties.methods without (silently) messing up the results for users/pipelines that rely on rowRanks(). This will probably involve something like:

  1. Release 1: If missing(ties.method) == TRUE, then produce an informative warning saying this argument needs to be set to ties.method = "max" to avoid the warning. Here we could possibly hijack .Deprecated() warnings, which with some luck will be reported by R CMD check (some updates have been made toward this direction).

  2. Release 2: If missing(ties.method) == TRUE, turn above warning into an error. Here we can also change the default to be ties.method = "average".

  3. Release 3: Allow for missing(ties.method) == TRUE again.

To speed up the above, I could run reverse dependency checks with missing(ties.method) == TRUE producing an error to catch obvious candidates and report to their maintainers.

@HenrikBengtsson
Copy link
Owner

HenrikBengtsson commented Jun 23, 2021

I've scanned all 347 reverse dependencies on CRAN and Bioconductor 3.13 (R 4.1.0) for those that use colRanks() or rowRanks() and I identified the following packages:

  • rowRanks(): [6] EventPointer, fishpond, MatrixGenerics, matrixTests, nparMD, STROMA4
  • colRanks(): [7] clustifyr, HDSpatialScan, MatrixGenerics, MetaNeighbor, reconsi, singscore, WGCNA

Thus, as of 2021-06-22, there are 12 packages in total that rely on these matrixStats functions:

Bioconductor:

  1. clustifyr

  2. EventPointer

  3. fishpond

  4. MatrixGenerics

  5. MetaNeighbor

    • Code: ???
    • Result: ???
    • R_MATRIXSTATS_TIES_METHOD_MISSING=defunct => 'OK'
  6. reconsi

  7. singscore

  8. STROMA4

    • Code: R/random.ranks.R: matrixStats::rowRanks(datrank.rand.col)
    • Result: FAIL (does not specify ties.method),
    • R_MATRIXSTATS_TIES_METHOD_MISSING=defunct => R CMD check ERROR
  9. DelayedMatrixStats

  10. sparseMatrixStats

CRAN:

  1. HDSpatialScan
  2. matrixTests
  3. nparMD
  4. WGCNA

These are the packages that we must make sure don't make assumptions about the default ties.method value.

UPDATE: I guess my "scan" script produces false negatives, because I'd expect DelayMatrixStats and sparseMatrixStats to also show up above, but they don't.

UPDATE 2: It might be that DelayMatrixStats and sparseMatrixStats only imports from matrixStats in their package tests, which I did not scan.

HenrikBengtsson added a commit that referenced this issue Apr 10, 2024
… rowRanks() will become deprecated in R (>= 4.4.0) [#142]
@HenrikBengtsson
Copy link
Owner

HenrikBengtsson commented Apr 10, 2024

I ran another scan for packages not specifying the ties.method argument and found the following.

CRAN:

Bioconductor:

  • MatrixGenerics (cc/ @const-ae, @PeteHaitch)
  • STROMA4 1.24.0 (emailed maintainer on 2024-04-10) Package no longer on Bioconductor (2024-12-29)

This won't become defunct anytime soon, but I will make it deprecated (i.e. produce warnings) starting with R (>= 4.4.0) [2024-04-24] and therefore also Bioconductor (>= 3.19) [2024-05-01].

HenrikBengtsson added a commit that referenced this issue Apr 10, 2024
… rowRanks() will become deprecated in R (>= 4.4.0) - take 2 [#142]
HenrikBengtsson added a commit that referenced this issue Apr 10, 2024
…0-9000 R_MATRIXSTATS_TIES_METHOD_MISSING=defunct R_MATRIXSTATS_TIES_METHOD_FREQ=1 [#142] [ci skip]
@HenrikBengtsson
Copy link
Owner

This won't become defunct anytime soon, but I will make it deprecated (i.e. produce warnings) starting with R (>= 4.4.0) [2024-04-24] and therefore also Bioconductor (>= 3.19) [2024-05-01].

This is implemented in matrixStats 1.3.0, which is now on CRAN.

@HenrikBengtsson
Copy link
Owner

NEW SCAN on 2024-12-29: I ran another scan for CRAN and Bioconductor packages not specifying the ties.method argument.

CRAN:

Bioconductor:

So, nothing changed since 2024-04-11.

@PeteHaitch
Copy link
Contributor

PeteHaitch commented Jan 3, 2025

@HenrikBengtsson the two cases I can find in MatrixGenerics are tickled in the unit tests where we are testing if the MatrixGenerics API matches that of matrixStats1:

  1. https://code.bioconductor.org/browse/MatrixGenerics/blob/devel/tests/testthat/test-api_compatibility.R#L513
  2. https://code.bioconductor.org/browse/MatrixGenerics/blob/devel/tests/testthat/test-api_compatibility.R#L1205

I could change those calls2 to the form matrixStats::colRanks(x = mat, ties.method = 'max') (i.e. preserving the 'original' default ties.method value), but since the point of the test is to ensure that matrixStats::colRanks(x = mat) gives the same result as MatrixGenerics::colRanks(x = mat), that would seem to remove us from the test's intention.
It does, however, remove the ERROR when running R_MATRIXSTATS_TIES_METHOD_MISSING=defunct R CMD check MatrixGenerics_1.19.0.tar.gz on my machine.

I also tried to wrap those 2 lines in the tests in a suppressWarnings() but that didn't help

Footnotes

  1. There's a separate issue of args(matrixStats::rowRanks) and args(MatrixGenerics::rowRanks) not matching, which I need to look into; see https://github.com/Bioconductor/MatrixGenerics/issues/9.

  2. Note to self that change actually has to be made to https://code.bioconductor.org/browse/MatrixGenerics/blob/devel/tests/testthat/generate_tests_helper_script.R rather than directly to https://code.bioconductor.org/browse/MatrixGenerics/blob/devel/tests/testthat/test-api_compatibility.R

HenrikBengtsson added a commit that referenced this issue Jan 3, 2025
… warnings/errors for the time being [#142] [ci skip]
@HenrikBengtsson HenrikBengtsson modified the milestones: 1.5.0, Next release Jan 7, 2025
@HenrikBengtsson
Copy link
Owner

UPDATE: matrixStats 1.5.0 [2025-01-07] is now on CRAN, and it now produces a warning every 10:th call that does not explicitly specify ties.method. The plan is to escalate this to a defunct error in the next release, matrixStats 1.6.0.

@PeteHaitch
Copy link
Contributor

@HenrikBengtsson I should've said explicitly, I haven't actually made any changes to MatrixGenerics, so this will continue to be an 'error' when you run your reverse dependencies check with R_MATRIXSTATS_TIES_METHOD_MISSING=defunct.
Any thoughts on a neat way to keep the intended behaviour of our tests while not causing false positives(?) in your own revdep testing?

@HenrikBengtsson
Copy link
Owner

Any thoughts on a neat way to keep the intended behaviour of our tests while not causing false positives(?) in your own revdep testing?

No need to do anything special; a false positive is not a biggie. We just need to coordinate when this becomes defunct, which I guess should be done well in advance for the next Bioc release.

@PeteHaitch
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants