diff --git a/docs/lecs_html/04_confidence_intervals.slides.html b/docs/lecs_html/04_confidence_intervals.slides.html index d815420..ccb95cb 100644 --- a/docs/lecs_html/04_confidence_intervals.slides.html +++ b/docs/lecs_html/04_confidence_intervals.slides.html @@ -1,15276 +1,15408 @@ - - -
- - - - - - -image source: Modern Dive by Kim & McConville
- -image source: Data Science: A First Introduction by Timbers, Campbell & Lee
- -image source: Data Science: A First Introduction by Timbers, Campbell & Lee
- -True population mean = 154.51. The mean of our sample is 155.8.
- -image source: Data Science: A First Introduction by Timbers, Campbell & Lee
- -We can use our bootstrap distribution to calculate the plausible range of values for the population parameter:
-We can report both our sample point estimate and the plausible range where we expect our true population quantity to fall.
-image source: Data Science: A First Introduction by Timbers, Campbell & Lee
- -A. 2.5th percentile and 97.5th percentile
-B. 5th percentile and 95th percentile
-C. 10th percentile and 90th percentile
- -infer
package workflow for bootstrapping¶infer
package is an R: used for statistical inferencespecify()
: choose which variables will be the focus of the statistical inferencegenerate()
: here our main argument is reps
(how many different repetitions we would like)calculate()
: return observed statistic specified with stat
argumentset.seed(201)
-library(infer)
-library(tidyverse)
-#install.packages("vctrs")
-#install.packages("readr")
-library(vctrs)
-library(readr)
-
-set.seed(1)
-student_population <- tibble(student = 1:10000, grade = rnorm(n = 10000, mean = 70, sd = 5))
-
-student_sample <- rep_sample_n(student_population, size = 50, replace = FALSE, reps = 1) %>%
- ungroup() %>%
- select(-replicate)
-
-head(student_sample)
-
## Bootstrapping with old method
-bootstrap_dist <- rep_sample_n(student_sample, size = 50, replace = T, reps = 10000) %>%
- group_by(replicate) %>%
- summarize(sample_mean = mean(grade))
-
-# bootstrap_dist
-# 90% confidence interval
-ci <- bootstrap_dist %>%
- summarize(ci_lower = quantile(sample_mean, 0.05),
- ci_upper = quantile(sample_mean, 0.95))
-
-
-bootstrap_dist %>%
- ggplot(aes(sample_mean)) +
- geom_histogram() +
- geom_vline(xintercept = ci$ci_lower) +
- geom_vline(xintercept = ci$ci_upper)
-
# bootstrapping using infer package
-bootstrap_dist2 <- student_sample %>%
- specify(response = grade) %>%
- generate(type = "bootstrap", reps = 10000) %>%
- calculate(stat = "mean")
-
-#bootstrap_dist; bootstrap_dist2
-
-ci2 <- bootstrap_dist2 %>%
- get_confidence_interval(level = 0.90, type = "percentile")
-
-ci2
-
bootstrap_dist2 %>%
- visualize() +
- shade_confidence_interval(endpoints = ci2)
-
Suppose we take a sample of size 100 instead of 50. We then construct a 90% percentile-based confidence interval for our sample. Which interval would you expect to be wider?
-A. the interval with $n = 50$
-B. the interval with $n = 100$
-C. the two intervals will be the same
- -Suppose we keep the sample size at 50. We then construct a 90% percentile-based confidence interval and a 95% percentile-based confidence interval. Which interval would you expect to be wider?
-A. the 95% confidence interval
-B. the 90% confidence interval
-C. the two intervals will be the same
- -A group contract is a document to help you formalize the expectations you have for your group members and what they can expect of you. It will help you think about what you need from each other to work effectively as a team! You will create and agree on this contract as a team. Each member should “sign” (you can just type out your name) at the bottom of the submission. At a minimum, your group contract must address the following:
-What do we expect of one another regarding attendance at meetings, participation, frequency of communication, quality of work, etc.? What are our internal deadlines? (Warning: if working on separate parts, do not aim to put all the parts together on the last day – it takes time to integrate multiple parts.)
-What rules can we agree on to help us meet our goals and expectations?
-How will we address non-performance regarding these goals, expectations, policies and procedures?
- -Teamwork contracts are due Jul 16
-worksheet_04
¶Complete a Data Science project from the beginning (downloading data from the web) to the end (communicating methods and conclusions in a report).
+Deliverable | +Weight | +
---|---|
Team Work Contract | +1% | +
Proposal | +4% | +
Peer review | +3% | +
Final Report | +9% | +
Teammate Evaluation | +3% | +
--------------------- | +------ | +
Total | +20% | +
Can you guess what is the average number of hours that UBC students spend on social media on a daily basis?
+Let's call that unknown population parameter $\mu$ and your guess is $\bar{x}$
+You have 3 options
+Which option do you choose to play and what is your guess?
+ +Can you guess what is the average number of hours that UBC students spend on social media on a daily basis?
+Let's call that unknown population parameter $\mu$ and your guess is $\bar{x}$
+You have 3 options
+Which option do you choose to play and what is your guess?
+image source: Modern Dive by Kim & McConville
+ ++
+image source: Data Science: A First Introduction by Timbers, Campbell & Lee
+ +image source: Data Science: A First Introduction by Timbers, Campbell & Lee
+ +True population mean = 154.51. The mean of our sample is 155.8.
+ +image source: Data Science: A First Introduction by Timbers, Campbell & Lee
+ +We can use our bootstrap distribution to calculate the plausible range of values for the population parameter:
+We can report both our sample point estimate and the plausible range where we expect our true population quantity to fall.
+image source: Data Science: A First Introduction by Timbers, Campbell & Lee
+ +A. 2.5th percentile and 97.5th percentile
+B. 5th percentile and 95th percentile
+C. 10th percentile and 90th percentile
+ +infer
package workflow for bootstrapping¶infer
package is an R: used for statistical inferencespecify()
: choose which variables will be the focus of the statistical inferencegenerate()
: here our main argument is reps
(how many different repetitions we would like)calculate()
: return observed statistic specified with stat
argumentset.seed(201)
+library(infer)
+library(tidyverse)
+#install.packages("vctrs")
+#install.packages("readr")
+library(vctrs)
+library(readr)
+
+set.seed(1)
+student_population <- tibble(student = 1:10000, grade = rnorm(n = 10000, mean = 70, sd = 5))
+
+student_sample <- rep_sample_n(student_population, size = 50, replace = FALSE, reps = 1) %>%
+ ungroup() %>%
+ select(-replicate)
+
+head(student_sample)
+
## Bootstrapping with old method
+bootstrap_dist <- rep_sample_n(student_sample, size = 50, replace = T, reps = 10000) %>%
+ group_by(replicate) %>%
+ summarize(sample_mean = mean(grade))
+
+# bootstrap_dist
+# 90% confidence interval
+ci <- bootstrap_dist %>%
+ summarize(ci_lower = quantile(sample_mean, 0.05),
+ ci_upper = quantile(sample_mean, 0.95))
+
+
+bootstrap_dist %>%
+ ggplot(aes(sample_mean)) +
+ geom_histogram() +
+ geom_vline(xintercept = ci$ci_lower) +
+ geom_vline(xintercept = ci$ci_upper)
+
# bootstrapping using infer package
+bootstrap_dist2 <- student_sample %>%
+ specify(response = grade) %>%
+ generate(type = "bootstrap", reps = 10000) %>%
+ calculate(stat = "mean")
+
+#bootstrap_dist; bootstrap_dist2
+
+ci2 <- bootstrap_dist2 %>%
+ get_confidence_interval(level = 0.90, type = "percentile")
+
+ci2
+
bootstrap_dist2 %>%
+ visualize() +
+ shade_confidence_interval(endpoints = ci2)
+
Suppose we take a sample of size 100 instead of 50. We then construct a 90% percentile-based confidence interval for our sample. Which interval would you expect to be wider?
+A. the interval with $n = 50$
+B. the interval with $n = 100$
+C. the two intervals will be the same
+ +Suppose we keep the sample size at 50. We then construct a 90% percentile-based confidence interval and a 95% percentile-based confidence interval. Which interval would you expect to be wider?
+A. the 95% confidence interval
+B. the 90% confidence interval
+C. the two intervals will be the same
+ +A group contract is a document to help you formalize the expectations you have for your group members and what they can expect of you. It will help you think about what you need from each other to work effectively as a team! You will create and agree on this contract as a team. Each member should “sign” (you can just type out your name) at the bottom of the submission. At a minimum, your group contract must address the following:
+What do we expect of one another regarding attendance at meetings, participation, frequency of communication, quality of work, etc.? What are our internal deadlines? (Warning: if working on separate parts, do not aim to put all the parts together on the last day – it takes time to integrate multiple parts.)
+What rules can we agree on to help us meet our goals and expectations?
+How will we address non-performance regarding these goals, expectations, policies and procedures?
+ +Teamwork contracts are due October 7
+worksheet_04
¶