-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement slice(), *_join(), and other dplyr methods for tbl_svy. #65
Comments
The |
I agree that joins can mess with the design, but the package already handles indexing that duplicates rows (e.g., |
Another approach to joins would be to check whether join adds or duplicates any rows and simply not to allow join in such cases. |
I think it would make sense to add filtering joins ( I'm ambivalent about whether it's worth adding |
Yeah, agree on filtering joins, I just don't see them as super useful without the other joins. For mutating joins, I meant to investigate this comment from krivit, but never did (and likely won't have time for a while):
If this is usually the right thing to do (my ignorance of the math behind surveys is really coming out here), I can imagine a warning instead of an error when a join creates duplicates. I think we also would need a warning for mutating joins when both Anyone have real world examples where they wanted to do this (preferably with sharable data so they're full reprexes, but I'm also just trying to wrap my head around it, so it's okay if not)? |
Well, it may make sense to create clusters on duplicated rows, but whether it actually makes heavily depends on ones workflow - there's no way package can check this. Personally I'm rather devoted to the idea of being explicit about survey design and do not modifying it on the flight (in the operation that don't look like it modifies the design) but that's matter of personal preferences and if srvyr already handles such a thing in the operation of selecting rows it makes sense it will also behave analogously while performing joins. Nevertheless I think there should be warning or at least note in such a situation - my personal experience is duplication of rows in joins often comes from mistakenly assuming that (combinations of) values of key variable(s) are unique while they somehow have unwillingly duplicated on a previous stage of performing complex data transformations. |
@gergness, The specific case I am dealing with is something called egocentric network data. For example, I might ask each survey respondent about their own demographic information (age, sex, race/ethnicity, etc.) and put them in Table Since I selected my respondents using some kind of a sampling design, I might create a You can see those examples in the @tzoltak, my preference would be to emulate the behaviour of library(survey)
data(mtcars)
(carsvy <- svydesign(~1, data=mtcars))
#> Warning in svydesign.default(~1, data = mtcars): No weights or probabilities
#> supplied, assuming equal probability
#> Independent Sampling design (with replacement)
#> svydesign(~1, data = mtcars)
carsvy[rep(1:2, each=2)]
#> 1 - level Cluster Sampling design (with replacement)
#> With (2) clusters.
#> svydesign(~1, data = mtcars) Created on 2021-05-12 by the reprex package (v2.0.0) |
Oops, didn't mean to close. Filtering joins are available now though. |
There is a
filter()
method fortbl_svy
, but there isn't aslice()
method, or any of the*_join()
methods, as far as I can tell. Would it be possible to implement them? Thanks in advance!The text was updated successfully, but these errors were encountered: