Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opinion: survey_prop should default to proportion = TRUE #141

Open
szimmer opened this issue Mar 11, 2022 · 3 comments
Open

Opinion: survey_prop should default to proportion = TRUE #141

szimmer opened this issue Mar 11, 2022 · 3 comments

Comments

@szimmer
Copy link
Contributor

szimmer commented Mar 11, 2022

survey_mean and survey_prop are vary similar. I feel, based on the function name, survey_prop should default to proportion=TRUE. Thoughts?

@gergness
Copy link
Owner

If I had a time machine and could set it this way from the start I think I agree. I'm less sure that I should do it now that it would change existing code, but maybe it's not so bad.

Using github search, I don't think anyone has specified prop, so it would change code, though possibly for the better.
https://github.com/search?l=R&q=survey_prop&type=Code

@bschneidr (or anyone else following), do you have an opinion?

Maybe I could change but borrow the warning tools from tidyverse, like they do when for summarize when no .groups is specified.

mtcars %>% group_by(cyl, am) %>% summarize(n = n())
#> `summarise()` has grouped output by 'cyl'. You can override using the `.groups`
#> argument.

@bschneidr
Copy link
Contributor

I think this is a good suggestion, @szimmer.

My sense is that when someone chooses to use survey_prop() rather than survey_mean(), it's because they're trying to (a) write code whose intent is easier for readers to understand, and (b) use a function that's presumably more statistically appropriate for proportions. Changing the default value to proportion = TRUE would make survey_prop() more helpful for (b).

Making this update would change code, but I think it's generally for the better. The default "logit" method used by svyciprop() may not be the best default method, but it should be generally better than the simple Wald method used by svymean().

The Wald interval method has long been known to have coverage issues with complex surveys, and a recent simulation study had some pretty strong recommendations against its use:

We have seen that the Wald CI is badly flawed for estimating proportions in complex surveys due to its severe undercoverage in a variety of situations. Improving the estimation of sampling variance does not salvage the Wald interval, which performs poorly even when the true sampling variance is known...
Even when our method cannot be used, a strong recommendation still emerges from our simulations: the Wald interval
is not to be used and should be replaced by the preferred non-Wald method...

Carolina Franco et al. 2019 "Comparative Study of Confidence Intervals for Proportions in Complex Sample Surveys", Journal of Survey Statistics and Methodology
https://doi.org/10.1093/jssam/smy019

I guess the only concern here is users might be surprised if their old analysis results become harder to reproduce. I think a temporary warning to use for the next release could be good. Something like:

survey_prop <- function(....) {
  if (missing(proportion)) {
  warning("When `proportion` is unspecified, `survey_prop()` now defaults to `proportion = TRUE`. This should improve confidence interval coverage.")
  }
}

But using the tidyverse warning tools to only show this once per session.

@szimmer
Copy link
Contributor Author

szimmer commented Mar 21, 2022

I make an issue and start a discussion and then go on vacation! I agree with the warning and can implement later this week if no one else jumps on it first.

The type of interval to use as a default is a good question. FWIW, SUDAAN and SAS use xlogit as their default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants