Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HEADS UP: New default for ties.method of {col,row}Ranks() #20

Open
HenrikBengtsson opened this issue Jun 23, 2021 · 2 comments
Open

HEADS UP: New default for ties.method of {col,row}Ranks() #20

HenrikBengtsson opened this issue Jun 23, 2021 · 2 comments

Comments

@HenrikBengtsson
Copy link

HenrikBengtsson commented Jun 23, 2021

Background

matrixStats uses ties.method = "max" as the default for colRanks() and rowRanks() for legacy reasons, but we want eventually update to ties.method = "average" to align it with base::rank(), cf. HenrikBengtsson/matrixStats#142.

The process for this migration with be:

  1. Give a deprecation warning if ties.method is not explicitly specified (long time; several releases)
  2. Give a defunct error if ties.method is not explicitly specified (long time; several releases)
  3. Switch the new default to ties.method = "average"

This will have to take a long time in order to make sure end-users out there will notice this and update their code. I hope this will minimize the risk for existing code all of a sudden start producing different results.

Issue

sparseMatrixStats gives an ERROR when I revdep check asserting !isTRUE(missing(ties.method)), cf. https://github.com/HenrikBengtsson/matrixStats/blob/feature/default-rank-ties.method/revdep/R_MATRIXSTATS_TIES_METHOD_MISSING%3Ddefunct/problems.md#sparsematrixstats

@const-ae
Copy link
Owner

Hi Henrik, thanks for the heads-up.

Could you say a bit more what is the advantage of using ties.method = "average" as the default and from now on forcing people to explicitly specify ties.method? I often use the ranks function without explicitly setting ties.method, because I don't really care what happens with the ties or know that I don't have any. And only if I realize that the ties are important, I specifically choose theties.method that matches my needs.

And I assume, that I will also need to duplicate the efforts to warn people about not specifying ties.method to stay aligned with matrixStats and avoid suddenly breaking code when you do make the switch in step 3, is that correct?

@HenrikBengtsson
Copy link
Author

Could you say a bit more what is the advantage of using ties.method = "average" ...

Only to align with base::rank(). The existing discrepancy in matrixStats may result in incorrect results due to assumptions about the default.

... from now on forcing people to explicitly specify ties.method?

Only until we get to Step 3, when ties.method = "average" (= formals(base::rank)$ties.method) becomes the new default. After that, we'll drop the requirement of having to specify ties.method.

A more abrupt approach would be to just switch the default, but I'm too conservative for such a silent change.

And I assume, that I will also need to duplicate the efforts to warn people about not specifying ties.method to stay aligned with matrixStats and avoid suddenly breaking code when you do make the switch in step 3, is that correct?

Yes, I think so. You could keep your current default, if you'd like, or take the same deprecate-defunct-change_default approach as I do above.

Note that this is an extremely rare event, but I think it's worth fixing it, especially for future generations. FWIW, in hindsight, we should have made ties.method = "average" the default when we started out many years ago and just have given an error "ties.method = "average' is not yet supported". Then we wouldn't have to do this complicated update process.

Clear as mud?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants