diff --git a/_freeze/schedule/slides/bootstrap/execute-results/html.json b/_freeze/schedule/slides/bootstrap/execute-results/html.json index 68defea..0a3c1eb 100644 --- a/_freeze/schedule/slides/bootstrap/execute-results/html.json +++ b/_freeze/schedule/slides/bootstrap/execute-results/html.json @@ -1,7 +1,7 @@ { "hash": "d2869649cdb959b50f0e8ab08e8f9e05", "result": { - "markdown": "---\nlecture: \"The bootstrap\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n---\n---\n\n## {background-image=\"img/consult.jpeg\" background-opacity=\"0.3\"}\n\n[{{< meta lecture >}}]{.slide-title}\n\n\n[Stat 550]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 30 January 2024\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\mid}\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n$$\n\n\n\n\n\n\n\n## {background-image=\"https://www.azquotes.com/picture-quotes/quote-i-believe-in-pulling-yourself-up-by-your-own-bootstraps-i-believe-it-is-possible-i-saw-stephen-colbert-62-38-03.jpg\" background-size=\"contain\"}\n\n\n## {background-image=\"http://rackjite.com/wp-content/uploads/rr11014aa.jpg\" background-size=\"contain\"}\n\n\n## In statistics...\n\nThe \"bootstrap\" works. And well.\n\nIt's good for \"second-level\" analysis.\n\n* \"First-level\" analyses are things like $\\hat\\beta$, $\\hat y$, an estimator of the center (a median), etc.\n\n* \"Second-level\" are things like $\\Var{\\hat\\beta}$, a confidence interval for $\\hat y$, or a median, etc.\n\nYou usually get these \"second-level\" properties from \"the sampling distribution of an estimator\"\n\n. . .\n\nBut what if you don't know the sampling distribution? Or you're skeptical of the CLT argument?\n\n\n## Refresher on sampling distributions\n\n1. If $X_i$ are iid Normal $(0,\\sigma^2)$, then $\\Var{\\bar{X}} = \\sigma^2 / n$.\n1. If $X_i$ are iid and $n$ is big, then $\\Var{\\bar{X}} \\approx \\Var{X_1} / n$.\n1. If $X_i$ are iid Binomial $(m, p)$, then $\\Var{\\bar{X}} = mp(1-p) / n$\n\n\n\n## Example of unknown sampling distribution\n\nI estimate a LDA on some data.\n\nI get a new $x_0$ and produce $\\hat{Pr}(y_0 =1 \\given x_0)$.\n\nCan I get a 95% confidence interval for $Pr(y_0=1 \\given x_0)$?\n\n. . .\n\nThe bootstrap gives this to you.\n\n\n\n\n## Procedure\n\n1. Resample your training data w/ replacement.\n2. Calculate a LDA on this sample.\n3. Produce a new prediction, call it $\\widehat{Pr}_b(y_0 =1 \\given x_0)$.\n4. Repeat 1-3 $b = 1,\\ldots,B$ times.\n5. CI: $\\left[2\\widehat{Pr}(y_0 =1 \\given x_0) - \\widehat{F}_{boot}(1-\\alpha/2),\\ 2\\widehat{Pr}(y_0 =1 \\given x_0) - \\widehat{F}_{boot}(\\alpha/2)\\right]$\n\n\n\n$\\hat{F}$ is the \"empirical\" distribution of the bootstraps. \n\n\n## Very basic example\n\n* Let $X_i\\sim Exponential(1/5)$. The pdf is $f(x) = \\frac{1}{5}e^{-x/5}$\n\n\n* I know if I estimate the mean with $\\bar{X}$, then by the CLT (if $n$ is big), \n\n$$\\frac{\\sqrt{n}(\\bar{X}-E[X])}{s} \\approx N(0, 1).$$\n\n\n* This gives me a 95% confidence interval like\n$$\\bar{X} \\pm 2 \\frac{s}{\\sqrt{n}}$$\n\n\n* But I don't want to estimate the mean, I want to estimate the median.\n\n\n---\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-1-1.svg){fig-align='center'}\n:::\n:::\n\n\n## Now what\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n\n* I give you a sample of size 500, you give me the sample median.\n\n* How do you get a CI?\n\n* You can use the bootstrap!\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nset.seed(2022-11-01)\nx <- rexp(n, 1 / 5)\n(med <- median(x)) # sample median\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3.669627\n```\n:::\n\n```{.r .cell-code}\nB <- 100\nalpha <- 0.05\nbootMed <- function() median(sample(x, replace = TRUE)) # resample, and get the median\nFhat <- replicate(B, bootMed()) # repeat B times, \"empirical distribution\"\nCI <- 2 * med - quantile(Fhat, probs = c(1 - alpha / 2, alpha / 2))\n```\n:::\n\n\n---\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-4-1.svg){fig-align='center'}\n:::\n:::\n\n\n## {background-image=\"gfx/boot1.png\" background-size=\"contain\"}\n\n## {background-image=\"gfx/boot2.png\" background-size=\"contain\"}\n\n## Slightly harder example\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n\n:::: {.columns}\n::: {.column width=\"50%\"}\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-6-1.svg){fig-align='center'}\n:::\n:::\n\n\n:::\n\n::: {.column width=\"50%\"}\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output .cell-output-stdout}\n```\n\nCall:\nlm(formula = Hwt ~ 0 + Bwt, data = fatcats)\n\nResiduals:\n Min 1Q Median 3Q Max \n-6.9293 -1.0460 -0.1407 0.8298 16.2536 \n\nCoefficients:\n Estimate Std. Error t value Pr(>|t|) \nBwt 3.81895 0.07678 49.74 <2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nResidual standard error: 2.549 on 143 degrees of freedom\nMultiple R-squared: 0.9454,\tAdjusted R-squared: 0.945 \nF-statistic: 2474 on 1 and 143 DF, p-value: < 2.2e-16\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n 2.5 % 97.5 %\nBwt 3.667178 3.97073\n```\n:::\n:::\n\n:::\n::::\n\n\n## When we fit models, we examine diagnostics\n\n\n:::: {.columns}\n::: {.column width=\"50%\"}\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-8-1.svg){fig-align='center'}\n:::\n:::\n\n\n\nThe tails are too fat, I don't believe that CI...\n:::\n\n::: {.column width=\"50%\"}\n\n\nWe bootstrap\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nB <- 500\nbhats <- double(B)\nalpha <- .05\nfor (b in 1:B) {\n samp <- sample(1:nrow(fatcats), replace = TRUE)\n newcats <- fatcats[samp, ] # new data\n bhats[b] <- coef(lm(Hwt ~ 0 + Bwt, data = newcats)) \n}\n\n2 * coef(cats.lm) - # Bootstrap CI\n quantile(bhats, probs = c(1 - alpha / 2, alpha / 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 97.5% 2.5% \n3.654977 3.955927 \n```\n:::\n\n```{.r .cell-code}\nconfint(cats.lm) # Original CI\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 2.5 % 97.5 %\nBwt 3.667178 3.97073\n```\n:::\n:::\n\n:::\n::::\n\n\n## An alternative\n\n* So far, I didn't use any information about the data-generating process. \n\n* We've done the [non-parametric bootstrap]{.secondary}\n\n* This is easiest, and most common for most cases.\n\n. . .\n\n[But there's another version]{.secondary}\n\n* You could try a \"parametric bootstrap\"\n\n* This assumes knowledge about the DGP\n\n## Same data\n\n:::: {.columns}\n::: {.column width=\"50%\"}\n\n[Non-parametric bootstrap]{.secondary}\n\nSame as before\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nB <- 500\nbhats <- double(B)\nalpha <- .05\nfor (b in 1:B) {\n samp <- sample(1:nrow(fatcats), replace = TRUE)\n newcats <- fatcats[samp, ] # new data\n bhats[b] <- coef(lm(Hwt ~ 0 + Bwt, data = newcats)) \n}\n\n2 * coef(cats.lm) - # NP Bootstrap CI\n quantile(bhats, probs = c(1-alpha/2, alpha/2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 97.5% 2.5% \n3.673559 3.970251 \n```\n:::\n\n```{.r .cell-code}\nconfint(cats.lm) # Original CI\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 2.5 % 97.5 %\nBwt 3.667178 3.97073\n```\n:::\n:::\n\n:::\n\n::: {.column width=\"50%\"}\n[Parametric bootstrap]{.secondary}\n\n1. Assume that the linear model is TRUE.\n2. Then, $\\texttt{Hwt}_i = \\widehat{\\beta}\\times \\texttt{Bwt}_i + \\widehat{e}_i$, $\\widehat{e}_i \\approx \\epsilon_i$\n3. The $\\epsilon_i$ is random $\\longrightarrow$ just resample $\\widehat{e}_i$.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nB <- 500\nbhats <- double(B)\nalpha <- .05\ncats.lm <- lm(Hwt ~ 0 + Bwt, data = fatcats)\nnewcats <- fatcats\nfor (b in 1:B) {\n samp <- sample(residuals(cats.lm), replace = TRUE)\n newcats$Hwt <- predict(cats.lm) + samp # new data\n bhats[b] <- coef(lm(Hwt ~ 0 + Bwt, data = newcats)) \n}\n\n2 * coef(cats.lm) - # Parametric Bootstrap CI\n quantile(bhats, probs = c(1 - alpha/2, alpha/2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 97.5% 2.5% \n3.665531 3.961896 \n```\n:::\n:::\n\n\n:::\n::::\n\n## Bootstrap error sources\n\n\n[Simulation error]{.secondary}:\n\nusing only $B$ samples to estimate $F$ with $\\hat{F}$.\n\n[Statistical error]{.secondary}:\n\nour data depended on a sample from the population. We don't have the whole population so we make an error by using a sample \n\n(Note: this part is what __always__ happens with data, and what the science of statistics analyzes.)\n\n[Specification error]{.secondary}:\n\nIf we use the parametric bootstrap, and our model is wrong, then we are overconfident.\n\n\n\n", + "markdown": "---\nlecture: \"The bootstrap\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n## {{< meta lecture >}} {background-image=\"img/consult.jpeg\" background-opacity=\"0.3\"}\n\n[Stat 550]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 30 January 2024\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\mid}\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n$$\n\n\n\n\n\n\n\n## {background-image=\"https://www.azquotes.com/picture-quotes/quote-i-believe-in-pulling-yourself-up-by-your-own-bootstraps-i-believe-it-is-possible-i-saw-stephen-colbert-62-38-03.jpg\" background-size=\"contain\"}\n\n\n## {background-image=\"http://rackjite.com/wp-content/uploads/rr11014aa.jpg\" background-size=\"contain\"}\n\n\n## In statistics...\n\nThe \"bootstrap\" works. And well.\n\nIt's good for \"second-level\" analysis.\n\n* \"First-level\" analyses are things like $\\hat\\beta$, $\\hat y$, an estimator of the center (a median), etc.\n\n* \"Second-level\" are things like $\\Var{\\hat\\beta}$, a confidence interval for $\\hat y$, or a median, etc.\n\nYou usually get these \"second-level\" properties from \"the sampling distribution of an estimator\"\n\n. . .\n\nBut what if you don't know the sampling distribution? Or you're skeptical of the CLT argument?\n\n\n## Refresher on sampling distributions\n\n1. If $X_i$ are iid Normal $(0,\\sigma^2)$, then $\\Var{\\bar{X}} = \\sigma^2 / n$.\n1. If $X_i$ are iid and $n$ is big, then $\\Var{\\bar{X}} \\approx \\Var{X_1} / n$.\n1. If $X_i$ are iid Binomial $(m, p)$, then $\\Var{\\bar{X}} = mp(1-p) / n$\n\n\n\n## Example of unknown sampling distribution\n\nI estimate a LDA on some data.\n\nI get a new $x_0$ and produce $\\hat{Pr}(y_0 =1 \\given x_0)$.\n\nCan I get a 95% confidence interval for $Pr(y_0=1 \\given x_0)$?\n\n. . .\n\nThe bootstrap gives this to you.\n\n\n\n\n## Procedure\n\n1. Resample your training data w/ replacement.\n2. Calculate a LDA on this sample.\n3. Produce a new prediction, call it $\\widehat{Pr}_b(y_0 =1 \\given x_0)$.\n4. Repeat 1-3 $b = 1,\\ldots,B$ times.\n5. CI: $\\left[2\\widehat{Pr}(y_0 =1 \\given x_0) - \\widehat{F}_{boot}(1-\\alpha/2),\\ 2\\widehat{Pr}(y_0 =1 \\given x_0) - \\widehat{F}_{boot}(\\alpha/2)\\right]$\n\n\n\n$\\hat{F}$ is the \"empirical\" distribution of the bootstraps. \n\n\n## Very basic example\n\n* Let $X_i\\sim Exponential(1/5)$. The pdf is $f(x) = \\frac{1}{5}e^{-x/5}$\n\n\n* I know if I estimate the mean with $\\bar{X}$, then by the CLT (if $n$ is big), \n\n$$\\frac{\\sqrt{n}(\\bar{X}-E[X])}{s} \\approx N(0, 1).$$\n\n\n* This gives me a 95% confidence interval like\n$$\\bar{X} \\pm 2 \\frac{s}{\\sqrt{n}}$$\n\n\n* But I don't want to estimate the mean, I want to estimate the median.\n\n\n---\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-1-1.svg){fig-align='center'}\n:::\n:::\n\n\n## Now what\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n\n* I give you a sample of size 500, you give me the sample median.\n\n* How do you get a CI?\n\n* You can use the bootstrap!\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nset.seed(2022-11-01)\nx <- rexp(n, 1 / 5)\n(med <- median(x)) # sample median\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3.669627\n```\n:::\n\n```{.r .cell-code}\nB <- 100\nalpha <- 0.05\nbootMed <- function() median(sample(x, replace = TRUE)) # resample, and get the median\nFhat <- replicate(B, bootMed()) # repeat B times, \"empirical distribution\"\nCI <- 2 * med - quantile(Fhat, probs = c(1 - alpha / 2, alpha / 2))\n```\n:::\n\n\n---\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-4-1.svg){fig-align='center'}\n:::\n:::\n\n\n## {background-image=\"gfx/boot1.png\" background-size=\"contain\"}\n\n## {background-image=\"gfx/boot2.png\" background-size=\"contain\"}\n\n## Slightly harder example\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n\n:::: {.columns}\n::: {.column width=\"50%\"}\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-6-1.svg){fig-align='center'}\n:::\n:::\n\n\n:::\n\n::: {.column width=\"50%\"}\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output .cell-output-stdout}\n```\n\nCall:\nlm(formula = Hwt ~ 0 + Bwt, data = fatcats)\n\nResiduals:\n Min 1Q Median 3Q Max \n-6.9293 -1.0460 -0.1407 0.8298 16.2536 \n\nCoefficients:\n Estimate Std. Error t value Pr(>|t|) \nBwt 3.81895 0.07678 49.74 <2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nResidual standard error: 2.549 on 143 degrees of freedom\nMultiple R-squared: 0.9454,\tAdjusted R-squared: 0.945 \nF-statistic: 2474 on 1 and 143 DF, p-value: < 2.2e-16\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n 2.5 % 97.5 %\nBwt 3.667178 3.97073\n```\n:::\n:::\n\n:::\n::::\n\n\n## When we fit models, we examine diagnostics\n\n\n:::: {.columns}\n::: {.column width=\"50%\"}\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-8-1.svg){fig-align='center'}\n:::\n:::\n\n\n\nThe tails are too fat, I don't believe that CI...\n:::\n\n::: {.column width=\"50%\"}\n\n\nWe bootstrap\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nB <- 500\nbhats <- double(B)\nalpha <- .05\nfor (b in 1:B) {\n samp <- sample(1:nrow(fatcats), replace = TRUE)\n newcats <- fatcats[samp, ] # new data\n bhats[b] <- coef(lm(Hwt ~ 0 + Bwt, data = newcats)) \n}\n\n2 * coef(cats.lm) - # Bootstrap CI\n quantile(bhats, probs = c(1 - alpha / 2, alpha / 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 97.5% 2.5% \n3.654977 3.955927 \n```\n:::\n\n```{.r .cell-code}\nconfint(cats.lm) # Original CI\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 2.5 % 97.5 %\nBwt 3.667178 3.97073\n```\n:::\n:::\n\n:::\n::::\n\n\n## An alternative\n\n* So far, I didn't use any information about the data-generating process. \n\n* We've done the [non-parametric bootstrap]{.secondary}\n\n* This is easiest, and most common for most cases.\n\n. . .\n\n[But there's another version]{.secondary}\n\n* You could try a \"parametric bootstrap\"\n\n* This assumes knowledge about the DGP\n\n## Same data\n\n:::: {.columns}\n::: {.column width=\"50%\"}\n\n[Non-parametric bootstrap]{.secondary}\n\nSame as before\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nB <- 500\nbhats <- double(B)\nalpha <- .05\nfor (b in 1:B) {\n samp <- sample(1:nrow(fatcats), replace = TRUE)\n newcats <- fatcats[samp, ] # new data\n bhats[b] <- coef(lm(Hwt ~ 0 + Bwt, data = newcats)) \n}\n\n2 * coef(cats.lm) - # NP Bootstrap CI\n quantile(bhats, probs = c(1-alpha/2, alpha/2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 97.5% 2.5% \n3.673559 3.970251 \n```\n:::\n\n```{.r .cell-code}\nconfint(cats.lm) # Original CI\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 2.5 % 97.5 %\nBwt 3.667178 3.97073\n```\n:::\n:::\n\n:::\n\n::: {.column width=\"50%\"}\n[Parametric bootstrap]{.secondary}\n\n1. Assume that the linear model is TRUE.\n2. Then, $\\texttt{Hwt}_i = \\widehat{\\beta}\\times \\texttt{Bwt}_i + \\widehat{e}_i$, $\\widehat{e}_i \\approx \\epsilon_i$\n3. The $\\epsilon_i$ is random $\\longrightarrow$ just resample $\\widehat{e}_i$.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nB <- 500\nbhats <- double(B)\nalpha <- .05\ncats.lm <- lm(Hwt ~ 0 + Bwt, data = fatcats)\nnewcats <- fatcats\nfor (b in 1:B) {\n samp <- sample(residuals(cats.lm), replace = TRUE)\n newcats$Hwt <- predict(cats.lm) + samp # new data\n bhats[b] <- coef(lm(Hwt ~ 0 + Bwt, data = newcats)) \n}\n\n2 * coef(cats.lm) - # Parametric Bootstrap CI\n quantile(bhats, probs = c(1 - alpha/2, alpha/2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 97.5% 2.5% \n3.665531 3.961896 \n```\n:::\n:::\n\n\n:::\n::::\n\n## Bootstrap error sources\n\n\n[Simulation error]{.secondary}:\n\nusing only $B$ samples to estimate $F$ with $\\hat{F}$.\n\n[Statistical error]{.secondary}:\n\nour data depended on a sample from the population. We don't have the whole population so we make an error by using a sample \n\n(Note: this part is what __always__ happens with data, and what the science of statistics analyzes.)\n\n[Specification error]{.secondary}:\n\nIf we use the parametric bootstrap, and our model is wrong, then we are overconfident.\n\n\n\n", "supporting": [ "bootstrap_files" ], diff --git a/schedule/slides/_titleslide.qmd b/schedule/slides/_titleslide.qmd index b0dde92..dc037c0 100644 --- a/schedule/slides/_titleslide.qmd +++ b/schedule/slides/_titleslide.qmd @@ -1,7 +1,4 @@ ---- ---- - -## {{< meta lecture >}} +## {{< meta lecture >}} {background-image="img/consult.jpeg" background-opacity="0.3"} [Stat 550]{.secondary}