Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information about the different degrees of freedom default to the srvyr vs survey vignette #119

Open
florisvdh opened this issue May 7, 2021 · 3 comments

Comments

@florisvdh
Copy link
Contributor

It appears that the 95% confidence limits from survey_mean() are slightly different from those obtained with survey::svymean() - is this a bug?

suppressPackageStartupMessages({
    library(survey)
    library(srvyr)
})
data(api)

srs_design <- apisrs %>% svydesign(id = ~1, fpc = ~fpc, data = .)
api_mean <- svymean(~enroll, srs_design)
(svy_ci <- confint(api_mean))
#>          2.5 %  97.5 %
#> enroll 530.969 638.251

srs_design_srvyr <- apisrs %>% as_survey_design(ids = 1, fpc = fpc)
(srvyr_ci <- 
    srs_design_srvyr %>% 
    summarise(enroll = survey_mean(enroll, vartype = "ci")))
#>   enroll enroll_low enroll_upp
#> 1 584.61   530.6408   638.5792

svy_ci[1,1] == srvyr_ci$enroll_low
#> [1] FALSE
svy_ci[1,2] == srvyr_ci$enroll_upp
#> [1] FALSE

svy_ci[1,1] == coef(api_mean) - qnorm(0.975) * SE(api_mean)
#>        enroll
#> enroll   TRUE
svy_ci[1,2] == coef(api_mean) + qnorm(0.975) * SE(api_mean)
#>        enroll
#> enroll   TRUE

Created on 2021-05-07 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.5 (2021-03-31)
#>  os       Linux Mint 20               
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language nl_BE:nl                    
#>  collate  nl_BE.UTF-8                 
#>  ctype    nl_BE.UTF-8                 
#>  tz       Europe/Brussels             
#>  date     2021-05-07                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.2)
#>  cli           2.4.0   2021-04-05 [1] CRAN (R 4.0.5)
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.0.3)
#>  DBI           1.1.1   2021-01-15 [1] CRAN (R 4.0.3)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.3)
#>  dplyr         1.0.5   2021-03-05 [1] CRAN (R 4.0.5)
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.2)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.2)
#>  fansi         0.4.2   2021-01-15 [1] CRAN (R 4.0.3)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
#>  generics      0.1.0   2020-10-31 [1] CRAN (R 4.0.3)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.2)
#>  htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
#>  knitr         1.31    2021-01-27 [1] CRAN (R 4.0.3)
#>  lattice       0.20-41 2020-04-02 [4] CRAN (R 4.0.0)
#>  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.4)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.3)
#>  Matrix      * 1.3-2   2021-01-06 [4] CRAN (R 4.0.3)
#>  mitools       2.4     2019-04-26 [1] CRAN (R 4.0.5)
#>  pillar        1.5.1   2021-03-05 [1] CRAN (R 4.0.5)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.2)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.2)
#>  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.3)
#>  reprex        2.0.0   2021-04-02 [1] CRAN (R 4.0.5)
#>  rlang         0.4.10  2020-12-30 [1] CRAN (R 4.0.3)
#>  rmarkdown     2.7     2021-02-19 [1] CRAN (R 4.0.4)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.3)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.2)
#>  srvyr       * 1.0.1   2021-03-28 [1] CRAN (R 4.0.5)
#>  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.2)
#>  survey      * 4.0     2020-04-03 [1] CRAN (R 4.0.5)
#>  survival    * 3.2-10  2021-03-16 [4] CRAN (R 4.0.4)
#>  tibble        3.1.0   2021-02-25 [1] CRAN (R 4.0.5)
#>  tidyr         1.1.3   2021-03-03 [1] CRAN (R 4.0.5)
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.2)
#>  utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.5)
#>  vctrs         0.3.7   2021-03-29 [1] CRAN (R 4.0.5)
#>  withr         2.4.1   2021-01-26 [1] CRAN (R 4.0.3)
#>  xfun          0.22    2021-03-11 [1] CRAN (R 4.0.4)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.2)
#> 
#> [1] /home/floris/lib/R/library
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
@gergness
Copy link
Owner

gergness commented May 7, 2021

Not a bug (though a little confusing, I admit). srvyr tries to be helpful by setting the df on confint for you.

If you prefer the results you got using the survey package, you can do this in srvyr:

(srvyr_ci_2 <- 
        srs_design_srvyr %>% 
        summarise(enroll = survey_mean(enroll, vartype = "ci", df = Inf)))
#>   enroll enroll_low enroll_upp
#> 1 584.61    530.969    638.251

Or to get the srvyr results in survey:

api_mean_2 <- svymean(~enroll, srs_design)
(svy_ci_2 <- confint(api_mean_2, df = degf(srs_design)))
#>           2.5 %   97.5 %
#> enroll 530.6408 638.5792

@gergness gergness closed this as completed May 7, 2021
@florisvdh
Copy link
Contributor Author

Thank you for explaining @gergness. And indeed the df argument is present in the documentation of survey_mean() 👍.

I think it would be good to have this aspect added in the srvyr-vs-survey vignette. The difference can be seen under heading Summary statistics, but it's not discussed.

> svy_ci_2[1,1] == srvyr_ci$enroll_low
[1] TRUE
> svy_ci_2[1,2] == srvyr_ci$enroll_upp
[1] TRUE
> 
> srvyr_ci$enroll_low == coef(api_mean) - qt(0.975, df = degf(srs_design)) * SE(api_mean)
       enroll
enroll   TRUE
> srvyr_ci$enroll_upp == coef(api_mean) + qt(0.975, df = degf(srs_design)) * SE(api_mean)
       enroll
enroll   TRUE
> 
> qt(0.975, df=Inf) == qnorm(0.975)
[1] TRUE

@gergness gergness reopened this May 10, 2021
@gergness gergness changed the title survey_mean(vartype = "ci") gives different CI limits than confint(svymean()) Add degrees of freedom default to the srvyr vs survey vignette May 10, 2021
@gergness gergness changed the title Add degrees of freedom default to the srvyr vs survey vignette Add information about the different degrees of freedom default to the srvyr vs survey vignette May 10, 2021
@gergness
Copy link
Owner

yeah, makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants