-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from UBC-STAT/2024-site
2024 site
- Loading branch information
Showing
175 changed files
with
20,513 additions
and
1,648 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"hash": "c2b921b8b26d6d86e1b90497439df844", | ||
"result": { | ||
"markdown": "---\ntitle: \"<i class='bi bi-calendar3'></i> Lecture slides and handouts\"\n---\n\n\n\n::: .callout-note\nTerm 2 runs 8 January to 13 April. \n:::\n\n::: {.callout-warning}\nSchedule is on Canvas. All dates subject to change.\n:::\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n\n## Lecture slides\n\n* [Bootstrap](slides/bootstrap.qmd)\n* [Cluster computing](slides/cluster-computing.qmd)\n* [Git and GitHub](slides/git.qmd)\n* [Skills for graduate students](slides/grad-school.qmd)\n* [Tips for organization](slides/organization.qmd)\n* [Giving presentations](slides/presentations.qmd)\n* [Time series](slides/time-series.qmd)\n* [Unit tests](slides/unit-tests.qmd)\n\n## Handouts\n\n* [Recommended books and sources](handouts/reference-books.qmd)\n* [Formatting consulting reports](handouts/report-formatting.qmd)\n\n\n<!--\n\n\n## Guest lecturers\n\n-->\n", | ||
"supporting": [ | ||
"index_files" | ||
], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
20 changes: 20 additions & 0 deletions
20
_freeze/schedule/slides/bootstrap/execute-results/html.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"hash": "d2869649cdb959b50f0e8ab08e8f9e05", | ||
"result": { | ||
"markdown": "---\nlecture: \"The bootstrap\"\nformat: revealjs\nmetadata-files: \n - _metadata.yml\n---\n---\n---\n\n## {{< meta lecture >}} {.large background-image=\"img/consult.jpeg\" background-opacity=\"0.3\"}\n\n[Stat 550]{.secondary}\n\n[{{< meta author >}}]{.secondary}\n\nLast modified -- 09 January 2024\n\n\n\n$$\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\mid}\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n$$\n\n\n\n\n\n\n\n## {background-image=\"https://www.azquotes.com/picture-quotes/quote-i-believe-in-pulling-yourself-up-by-your-own-bootstraps-i-believe-it-is-possible-i-saw-stephen-colbert-62-38-03.jpg\" background-size=\"contain\"}\n\n\n## {background-image=\"http://rackjite.com/wp-content/uploads/rr11014aa.jpg\" background-size=\"contain\"}\n\n\n## In statistics...\n\nThe \"bootstrap\" works. And well.\n\nIt's good for \"second-level\" analysis.\n\n* \"First-level\" analyses are things like $\\hat\\beta$, $\\hat y$, an estimator of the center (a median), etc.\n\n* \"Second-level\" are things like $\\Var{\\hat\\beta}$, a confidence interval for $\\hat y$, or a median, etc.\n\nYou usually get these \"second-level\" properties from \"the sampling distribution of an estimator\"\n\n. . .\n\nBut what if you don't know the sampling distribution? Or you're skeptical of the CLT argument?\n\n\n## Refresher on sampling distributions\n\n1. If $X_i$ are iid Normal $(0,\\sigma^2)$, then $\\Var{\\bar{X}} = \\sigma^2 / n$.\n1. If $X_i$ are iid and $n$ is big, then $\\Var{\\bar{X}} \\approx \\Var{X_1} / n$.\n1. If $X_i$ are iid Binomial $(m, p)$, then $\\Var{\\bar{X}} = mp(1-p) / n$\n\n\n\n## Example of unknown sampling distribution\n\nI estimate a LDA on some data.\n\nI get a new $x_0$ and produce $\\hat{Pr}(y_0 =1 \\given x_0)$.\n\nCan I get a 95% confidence interval for $Pr(y_0=1 \\given x_0)$?\n\n. . .\n\nThe bootstrap gives this to you.\n\n\n\n\n## Procedure\n\n1. Resample your training data w/ replacement.\n2. Calculate a LDA on this sample.\n3. Produce a new prediction, call it $\\widehat{Pr}_b(y_0 =1 \\given x_0)$.\n4. Repeat 1-3 $b = 1,\\ldots,B$ times.\n5. CI: $\\left[2\\widehat{Pr}(y_0 =1 \\given x_0) - \\widehat{F}_{boot}(1-\\alpha/2),\\ 2\\widehat{Pr}(y_0 =1 \\given x_0) - \\widehat{F}_{boot}(\\alpha/2)\\right]$\n\n\n\n$\\hat{F}$ is the \"empirical\" distribution of the bootstraps. \n\n\n## Very basic example\n\n* Let $X_i\\sim Exponential(1/5)$. The pdf is $f(x) = \\frac{1}{5}e^{-x/5}$\n\n\n* I know if I estimate the mean with $\\bar{X}$, then by the CLT (if $n$ is big), \n\n$$\\frac{\\sqrt{n}(\\bar{X}-E[X])}{s} \\approx N(0, 1).$$\n\n\n* This gives me a 95% confidence interval like\n$$\\bar{X} \\pm 2 \\frac{s}{\\sqrt{n}}$$\n\n\n* But I don't want to estimate the mean, I want to estimate the median.\n\n\n---\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nggplot(data.frame(x = c(0, 12)), aes(x)) + \n stat_function(fun = function(x) dexp(x, 1 / 5), \n color = \"orange\", linewidth = 2) +\n theme_bw(base_size = 24) +\n geom_vline(xintercept = 5, color = \"darkblue\", linewidth = 2) + # mean\n geom_vline(xintercept = qexp(.5,1/5), color = \"red\", linewidth = 2) + # median\n ylab(\"f(x)\") +\n annotate(\"label\", x = c(2.5, 5.5, 10), y = c(.15, .15, .05), \n label = c(\"median\", \"bar(x)\", \"pdf\"), parse = TRUE,\n color = c(\"red\", \"darkblue\", \"orange\"), size = 10)\n```\n\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-1-1.svg){fig-align='center'}\n:::\n:::\n\n\n## Now what\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n\n* I give you a sample of size 500, you give me the sample median.\n\n* How do you get a CI?\n\n* You can use the bootstrap!\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nset.seed(2022-11-01)\nx <- rexp(n, 1 / 5)\n(med <- median(x)) # sample median\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 3.669627\n```\n:::\n\n```{.r .cell-code}\nB <- 100\nalpha <- 0.05\nbootMed <- function() median(sample(x, replace = TRUE)) # resample, and get the median\nFhat <- replicate(B, bootMed()) # repeat B times, \"empirical distribution\"\nCI <- 2 * med - quantile(Fhat, probs = c(1 - alpha / 2, alpha / 2))\n```\n:::\n\n\n---\n\n\n::: {.cell layout-align=\"center\"}\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-4-1.svg){fig-align='center'}\n:::\n:::\n\n\n## {background-image=\"gfx/boot1.png\" background-size=\"contain\"}\n\n## {background-image=\"gfx/boot2.png\" background-size=\"contain\"}\n\n## Slightly harder example\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n\n\n:::: {.columns}\n::: {.column width=\"50%\"}\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nggplot(fatcats, aes(Bwt, Hwt)) + \n geom_point(color = \"darkblue\") + \n xlab(\"Cat body weight (Kg)\") + \n theme_bw(base_size = 18) +\n ylab(\"Cat heart weight (g)\")\n```\n\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-6-1.svg){fig-align='center'}\n:::\n:::\n\n\n:::\n\n::: {.column width=\"50%\"}\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ncats.lm <- lm(Hwt ~ 0 + Bwt, data = fatcats)\nsummary(cats.lm)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\nCall:\nlm(formula = Hwt ~ 0 + Bwt, data = fatcats)\n\nResiduals:\n Min 1Q Median 3Q Max \n-6.9293 -1.0460 -0.1407 0.8298 16.2536 \n\nCoefficients:\n Estimate Std. Error t value Pr(>|t|) \nBwt 3.81895 0.07678 49.74 <2e-16 ***\n---\nSignif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n\nResidual standard error: 2.549 on 143 degrees of freedom\nMultiple R-squared: 0.9454,\tAdjusted R-squared: 0.945 \nF-statistic: 2474 on 1 and 143 DF, p-value: < 2.2e-16\n```\n:::\n\n```{.r .cell-code}\nconfint(cats.lm)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 2.5 % 97.5 %\nBwt 3.667178 3.97073\n```\n:::\n:::\n\n:::\n::::\n\n\n## When we fit models, we examine diagnostics\n\n\n:::: {.columns}\n::: {.column width=\"50%\"}\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nqqnorm(residuals(cats.lm))\nqqline(residuals(cats.lm))\n```\n\n::: {.cell-output-display}\n![](bootstrap_files/figure-revealjs/unnamed-chunk-8-1.svg){fig-align='center'}\n:::\n:::\n\n\n\nThe tails are too fat, I don't believe that CI...\n:::\n\n::: {.column width=\"50%\"}\n\n\nWe bootstrap\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nB <- 500\nbhats <- double(B)\nalpha <- .05\nfor (b in 1:B) {\n samp <- sample(1:nrow(fatcats), replace = TRUE)\n newcats <- fatcats[samp, ] # new data\n bhats[b] <- coef(lm(Hwt ~ 0 + Bwt, data = newcats)) \n}\n\n2 * coef(cats.lm) - # Bootstrap CI\n quantile(bhats, probs = c(1 - alpha / 2, alpha / 2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 97.5% 2.5% \n3.654977 3.955927 \n```\n:::\n\n```{.r .cell-code}\nconfint(cats.lm) # Original CI\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 2.5 % 97.5 %\nBwt 3.667178 3.97073\n```\n:::\n:::\n\n:::\n::::\n\n\n## An alternative\n\n* So far, I didn't use any information about the data-generating process. \n\n* We've done the [non-parametric bootstrap]{.secondary}\n\n* This is easiest, and most common for most cases.\n\n. . .\n\n[But there's another version]{.secondary}\n\n* You could try a \"parametric bootstrap\"\n\n* This assumes knowledge about the DGP\n\n## Same data\n\n:::: {.columns}\n::: {.column width=\"50%\"}\n\n[Non-parametric bootstrap]{.secondary}\n\nSame as before\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nB <- 500\nbhats <- double(B)\nalpha <- .05\nfor (b in 1:B) {\n samp <- sample(1:nrow(fatcats), replace = TRUE)\n newcats <- fatcats[samp, ] # new data\n bhats[b] <- coef(lm(Hwt ~ 0 + Bwt, data = newcats)) \n}\n\n2 * coef(cats.lm) - # NP Bootstrap CI\n quantile(bhats, probs = c(1-alpha/2, alpha/2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 97.5% 2.5% \n3.673559 3.970251 \n```\n:::\n\n```{.r .cell-code}\nconfint(cats.lm) # Original CI\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 2.5 % 97.5 %\nBwt 3.667178 3.97073\n```\n:::\n:::\n\n:::\n\n::: {.column width=\"50%\"}\n[Parametric bootstrap]{.secondary}\n\n1. Assume that the linear model is TRUE.\n2. Then, $\\texttt{Hwt}_i = \\widehat{\\beta}\\times \\texttt{Bwt}_i + \\widehat{e}_i$, $\\widehat{e}_i \\approx \\epsilon_i$\n3. The $\\epsilon_i$ is random $\\longrightarrow$ just resample $\\widehat{e}_i$.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nB <- 500\nbhats <- double(B)\nalpha <- .05\ncats.lm <- lm(Hwt ~ 0 + Bwt, data = fatcats)\nnewcats <- fatcats\nfor (b in 1:B) {\n samp <- sample(residuals(cats.lm), replace = TRUE)\n newcats$Hwt <- predict(cats.lm) + samp # new data\n bhats[b] <- coef(lm(Hwt ~ 0 + Bwt, data = newcats)) \n}\n\n2 * coef(cats.lm) - # Parametric Bootstrap CI\n quantile(bhats, probs = c(1 - alpha/2, alpha/2))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n 97.5% 2.5% \n3.665531 3.961896 \n```\n:::\n:::\n\n\n:::\n::::\n\n## Bootstrap error sources\n\n\n[Simulation error]{.secondary}:\n\nusing only $B$ samples to estimate $F$ with $\\hat{F}$.\n\n[Statistical error]{.secondary}:\n\nour data depended on a sample from the population. We don't have the whole population so we make an error by using a sample \n\n(Note: this part is what __always__ happens with data, and what the science of statistics analyzes.)\n\n[Specification error]{.secondary}:\n\nIf we use the parametric bootstrap, and our model is wrong, then we are overconfident.\n\n\n\n", | ||
"supporting": [ | ||
"bootstrap_files" | ||
], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": { | ||
"include-after-body": [ | ||
"\n<script>\n // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n // slide changes (different for each slide format).\n (function () {\n // dispatch for htmlwidgets\n function fireSlideEnter() {\n const event = window.document.createEvent(\"Event\");\n event.initEvent(\"slideenter\", true, true);\n window.document.dispatchEvent(event);\n }\n\n function fireSlideChanged(previousSlide, currentSlide) {\n fireSlideEnter();\n\n // dispatch for shiny\n if (window.jQuery) {\n if (previousSlide) {\n window.jQuery(previousSlide).trigger(\"hidden\");\n }\n if (currentSlide) {\n window.jQuery(currentSlide).trigger(\"shown\");\n }\n }\n }\n\n // hookup for slidy\n if (window.w3c_slidy) {\n window.w3c_slidy.add_observer(function (slide_num) {\n // slide_num starts at position 1\n fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n });\n }\n\n })();\n</script>\n\n" | ||
] | ||
}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
Oops, something went wrong.