-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow visualize() to give density curve on non theoretical distribution #539
Comments
Thanks for the issue! Hope you don't mind that I edited your issue description to adjust code formatting. I'll first say that library(infer)
library(tidyverse)
set.seed(123)
faux_data <- tibble(dead = sample(
x = factor(c("yes", "no"), levels = c("yes", "no")),
size = 100, replace = T),
gender = sample(
x = factor(c("male", "female"), levels = c("male", "female")),
size = 100,
replace = T
))
faux_summary <- faux_data %>%
summarize(N = n(),
prop_dead = mean(dead == "yes"),
count_dead = sum(dead == "yes"),
.by = gender
)
faux_diff <- faux_data %>%
specify(dead ~ gender, success = "yes") %>%
calculate(stat = "z", order = c("male", "female")) %>%
pull()
faux_test <- faux_data %>%
prop_test(dead ~ gender, order = c("male", "female"))
faux_model <- faux_data %>%
specify(dead ~ gender, success = "yes") %>%
hypothesise(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "z", order = c("male", "female"))
faux_model %>% visualize() faux_model %>% visualize(method = "both")
#> Warning: Check to make sure the conditions have been met for the theoretical method.
#> infer currently does not check these for you. Created on 2024-08-19 with reprex v2.1.1 In general, I think we feel that histograms best represent the empirical distributions that can be created with infer pipelines, and likely won't extend that functionality in future releases of the package. That said, the outputs of library(infer)
library(tidyverse)
set.seed(123)
faux_data <- tibble(dead = sample(
x = factor(c("yes", "no"), levels = c("yes", "no")),
size = 100, replace = T),
gender = sample(
x = factor(c("male", "female"), levels = c("male", "female")),
size = 100,
replace = T
))
faux_summary <- faux_data %>%
summarize(N = n(),
prop_dead = mean(dead == "yes"),
count_dead = sum(dead == "yes"),
.by = gender
)
faux_diff <- faux_data %>%
specify(dead ~ gender, success = "yes") %>%
calculate(stat = "diff in props", order = c("male", "female")) %>%
pull()
faux_test <- faux_data %>%
prop_test(dead ~ gender, order = c("male", "female"))
faux_model <- faux_data %>%
specify(dead ~ gender, success = "yes") %>%
hypothesise(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "diff in props", order = c("male", "female"))
faux_model %>%
visualize() +
geom_density(aes(x = stat)) Created on 2024-08-19 with reprex v2.1.1 |
@simonpcouch Thanks so much for the guidance. It looks like I can get what I was looking for already! I had not seen stat = "z" and I will make sure to dig deeper next time. Have a great week, it was a pleasure learning with you at posit::conf. |
You're very welcome! You as well. |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Intro
Hello! First, thank you for taking time to review this issue. This is my first time posting an issue on Github so I might be making some mistakes. Please let me know how to improve this so I can make the material more helpful or to save you time.
Issue
In cases where you may want to overlay a density curve over the histogram produced by visualize(method = "simulation"), you cannot run visualize(method = "both") or you will get an error. It might be a helpful option to see shading under a density curve instead of thr histogram shading. See below for a reprex and example of what I propose be available for the empirical distribution as well via method = "simulation". This would be quite helpful for cases when you are looking at categorical data, as the error you get (see below) mentions that.
I will admit...
I do realize there is documentation showing how to use visualize() with both the theoretical and empirical distributions, but my use case here did not apply to the documentation / vignette.
It would be great for visualize to produce the density curve for non-theoretical distributions like the one produced above here is a home grown example.
Thanks!
Thanks again for your time and I look forward to interacting with you.
The text was updated successfully, but these errors were encountered: