From 2a5466c5c0f1e04e15df3588ca6e673bad448677 Mon Sep 17 00:00:00 2001 From: Grace Lawley Date: Fri, 27 Sep 2019 21:58:24 -0700 Subject: [PATCH 1/3] first pass cleaning up links --- 01_install.Rmd | 26 +- 02_r-basics.Rmd | 8 +- 05_data-care-feeding.Rmd | 12 +- 06_dplyr-intro.Rmd | 24 +- 07_dplyr-single-table.Rmd | 24 +- 08_tidy-data.Rmd | 2 +- 09_import-export.Rmd | 26 +- 10_factors.Rmd | 8 +- 11_character-vectors.Rmd | 46 +-- 12_character-encoding.Rmd | 14 +- 13_date-times.Rmd | 17 +- 14_multiple-tibbles.Rmd | 7 +- 15_join-tibbles.Rmd | 6 +- 17_r-objects-indexing.Rmd | 36 ++- 18_functions-part1.Rmd | 31 +- 19_functions-part2.Rmd | 14 +- 20_functions-part3.Rmd | 22 +- 21_functions-practicum.Rmd | 8 +- 22_r-graphics.Rmd | 4 +- 23_ggplot2.Rmd | 2 +- 24_effective-graphs.Rmd | 45 ++- 25_colors.Rmd | 53 ++-- 26_qualitative-colors.Rmd | 12 +- 27_secrets-happy-graphics.Rmd | 9 +- 28_saving-figures.Rmd | 33 +- 29_multiple-plots.Rmd | 17 +- 30_package-overview.Rmd | 6 +- 32_system-prep-packages.Rmd | 26 +- 33_create-package.Rmd | 2 +- 34_workflows.Rmd | 55 ++-- 36_api-wrappers.Rmd | 33 +- 37_diy-web-data.Rmd | 31 +- 38_shiny.Rmd | 58 ++-- 39_appendix.Rmd | 23 +- links.md | 426 ++++++-------------------- supporting-docs/foofactors-README.Rmd | 2 +- 36 files changed, 497 insertions(+), 671 deletions(-) diff --git a/01_install.Rmd b/01_install.Rmd index 3a4f890..109a32c 100644 --- a/01_install.Rmd +++ b/01_install.Rmd @@ -11,14 +11,14 @@ source("common.R") ## R and RStudio -* Install [R, a free software environment for statistical computing and graphics][r-proj] from [CRAN][cran], the Comprehensive R Archive Network. I __highly recommend__ you install a precompiled binary distribution for your operating system -- use the links up at the top of the CRAN page linked above! +* Install [R, a free software environment for statistical computing and graphics](https://www.r-project.org) from [CRAN](https://cloud.r-project.org), the Comprehensive R Archive Network. I __highly recommend__ you install a precompiled binary distribution for your operating system -- use the links up at the top of the CRAN page linked above! * Install RStudio's IDE (stands for _integrated development environment_), a powerful user interface for R. Get the Open Source Edition of RStudio Desktop. - - I __highly recommend__ you run the [Preview version][rstudio-preview]. I find these quite stable and you'll get the cool new features! Update to new Preview versions often. - - Of course, there are also official releases available [here][rstudio-official]. + - I __highly recommend__ you run the [Preview version](https://www.rstudio.com/products/rstudio/download/preview/). I find these quite stable and you'll get the cool new features! Update to new Preview versions often. + - Of course, there are also official releases available [here](https://www.rstudio.com/products/rstudio/#Desktop). - RStudio comes with a __text editor__, so there is no immediate need to install a separate stand-alone editor. - - RStudio can __interface with Git(Hub)__. However, you must do all the Git(Hub) set up [described elsewhere][happy-git] before you can take advantage of this. + - RStudio can __interface with Git(Hub)__. However, you must do all the Git(Hub) set up described elsewhere (see [Happy Git and GitHub for the useR]) before you can take advantage of this. If you have a pre-existing installation of R and/or RStudio, we __highly recommend__ that you reinstall both and get as current as possible. It can be considerably harder to run old software than new. @@ -32,13 +32,13 @@ If you have a pre-existing installation of R and/or RStudio, we __highly recomme ## Testing testing -* Do whatever is appropriate for your OS to launch RStudio. You should get a window similar to the screenshot you see [here][rstudio-workbench], but yours will be more boring because you haven't written any code or made any figures yet! +* Do whatever is appropriate for your OS to launch RStudio. You should get a window similar to the screenshot you see [here](https://www.rstudio.com/wp-content/uploads/2014/04/rstudio-workbench.png), but yours will be more boring because you haven't written any code or made any figures yet! * Put your cursor in the pane labelled Console, which is where you interact with the live R process. Create a simple object with code like `x <- 2 * 4` (followed by enter or return). Then inspect the `x` object by typing `x` followed by enter or return. You should see the value 8 print to screen. If yes, you've succeeded in installing R and RStudio. ## Add-on packages -R is an extensible system and many people share useful code they have developed as a _package_ via CRAN and GitHub. To install a package from CRAN, for example the [dplyr][dplyr-cran] package for data manipulation, here is one way to do it in the R console (there are others). +R is an extensible system and many people share useful code they have developed as a _package_ via CRAN and GitHub. To install a package from CRAN, for example the [dplyr] package for data manipulation, here is one way to do it in the R console (there are others). ```r install.packages("dplyr", dependencies = TRUE) @@ -48,19 +48,19 @@ By including `dependencies = TRUE`, we are being explicit and extra-careful to i You could use the above method to install the following packages, all of which we will use: -* tidyr, [package webpage][tidyr-web] -* ggplot2, [package webpage][ggplot2-web] +* [tidyr] +* [ggplot2] ## Further resources The above will get your basic setup ready but here are some links if you are interested in reading a bit further. -* [How to Use RStudio][rstudio-support] -* [RStudio's leads for learning R][rstudio-R-help] -* [R FAQ][cran-faq] -* [R Installation and Administration][cran-R-admin] -* [More about add-on packages in the R Installation and Administration Manual][cran-add-ons] +* [How to Use RStudio](https://support.rstudio.com/hc/en-us) +* [RStudio's leads for learning R](https://support.rstudio.com/hc/en-us/articles/200552336-Getting-Help-with-R) +* [R FAQ](https://cloud.r-project.org/faqs.html) +* [R Installation and Administration](http://cloud.r-project.org/doc/manuals/R-admin.html) +* [More about add-on packages in the R Installation and Administration Manual](https://cloud.r-project.org/doc/manuals/R-admin.html#Add_002don-packages) ```{r links, child="links.md"} diff --git a/02_r-basics.Rmd b/02_r-basics.Rmd index 06b0dee..3177243 100644 --- a/02_r-basics.Rmd +++ b/02_r-basics.Rmd @@ -16,7 +16,7 @@ Notice the default panes: * Environment/History (tabbed in upper right) * Files/Plots/Packages/Help (tabbed in lower right) -FYI: you can change the default location of the panes, among many other things: [Customizing RStudio][rstudio-customizing]. +FYI: you can change the default location of the panes, among many other things: [Customizing RStudio](https://support.rstudio.com/hc/en-us/articles/200549016-Customizing-RStudio). Go into the Console, where we interact with the live R process. @@ -37,7 +37,7 @@ You will make lots of assignments and the operator `<-` is a pain to type. Don't Notice that RStudio automagically surrounds `<-` with spaces, which demonstrates a useful code formatting practice. Code is miserable to read on a good day. Give your eyes a break and use spaces. -RStudio offers many handy [keyboard shortcuts][rstudio-key-shortcuts]. Also, Alt+Shift+K brings up a keyboard shortcut reference card. +RStudio offers many handy [keyboard shortcuts](https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts). Also, Alt+Shift+K brings up a keyboard shortcut reference card. Object names cannot start with a digit and cannot contain certain other characters such as a comma or a space. You will be wise to adopt a [convention for demarcating words][wiki-snake-case] in names. @@ -147,7 +147,7 @@ To handle these real life situations, you need to make two decisions: As a beginning R user, it's OK to consider your workspace "real". _Very soon_, I urge you to evolve to the next level, where you consider your saved R scripts as "real". (In either case, of course the input data is very much real and requires preservation!) With the input data and the R code you used, you can reproduce _everything_. You can make your analysis fancier. You can get to the bottom of puzzling results and discover and fix bugs in your code. You can reuse the code to conduct similar analyses in new projects. You can remake a figure with different aspect ratio or save is as TIFF instead of PDF. You are ready to take questions. You are ready for the future. -If you regard your workspace as "real" (saving and reloading all the time), if you need to redo analysis ... you're going to either redo a lot of typing (making mistakes all the way) or will have to mine your R history for the commands you used. Rather than [becoming an expert on managing the R history][rstudio-command-history], a better use of your time and psychic energy is to keep your "good" R code in a script for future reuse. +If you regard your workspace as "real" (saving and reloading all the time), if you need to redo analysis ... you're going to either redo a lot of typing (making mistakes all the way) or will have to mine your R history for the commands you used. Rather than [becoming an expert on managing the R history](https://support.rstudio.com/hc/en-us/articles/200526217-Command-History), a better use of your time and psychic energy is to keep your "good" R code in a script for future reuse. Because it can be useful sometimes, note the commands you've recently run appear in the History pane. @@ -197,7 +197,7 @@ But there's a better way. A way that also puts you on the path to managing your ## RStudio projects {#rprojs} -Keeping all the files associated with a project organized together -- input data, R scripts, analytical results, figures -- is such a wise and common practice that RStudio has built-in support for this via its [_projects_][rstudio-using-projects]. +Keeping all the files associated with a project organized together -- input data, R scripts, analytical results, figures -- is such a wise and common practice that RStudio has built-in support for this via its [_projects_](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects). Let's make one to use for the rest of this workshop/class. Do this: *File > New Project...*. The directory name you choose here will be the project name. Call it whatever you want (or follow me for convenience). diff --git a/05_data-care-feeding.Rmd b/05_data-care-feeding.Rmd index 84c4786..de0a1cc 100644 --- a/05_data-care-feeding.Rmd +++ b/05_data-care-feeding.Rmd @@ -36,13 +36,13 @@ Whenever you have rectangular, spreadsheet-y data, your default data receptacle - keeping them in sync vis-a-vis row order - applying any filtering of observations uniformly * Most functions for inference, modelling, and graphing are happy to be passed a data frame via a `data =` argument. This has been true in base R for a long time. -* The set of packages known as the [tidyverse][tidyverse-main-page] takes this one step further and explicitly prioritizes the processing of data frames. This includes popular packages like dplyr and ggplot2. In fact the tidyverse prioritizes a special flavor of data frame, called a "tibble". +* The set of packages known as the [tidyverse] takes this one step further and explicitly prioritizes the processing of data frames. This includes popular packages like [dplyr] and [ggplot2]. In fact the tidyverse prioritizes a special flavor of data frame, called a "tibble". Data frames -- unlike general arrays or, specifically, matrices in R -- can hold variables of different flavors, such as character data (subject ID or name), quantitative data (white blood cell count), and categorical information (treated vs. untreated). If you use homogeneous structures, like matrices, for data analysis, you are likely to make the terrible mistake of spreading a dataset out over multiple, unlinked objects. Why? Because you can't put character data, such as subject name, into the numeric matrix that holds white blood cell count. This fragmentation is a Bad Idea. ## Get the Gapminder data -We will work with some of the data from the [Gapminder project][gapminder-web]. I've released this as an [R package][gapminder-cran], so we can install it from CRAN like so: +We will work with some of the data from the [Gapminder project](https://www.gapminder.org). I've released this as an R package called [gapminder], so we can install it from CRAN like so: ```{r eval = FALSE} install.packages("gapminder") @@ -66,7 +66,7 @@ str(gapminder) We could print the `gapminder` object itself to screen. However, if you've used R before, you might be reluctant to do this, because large datasets just fill up your Console and provide very little insight. -This is the first big win for **tibbles**. The [tidyverse][tidyverse-web] offers a special case of R's default data frame: the "tibble", which is a nod to the actual class of these objects, `tbl_df`. +This is the first big win for **tibbles**. The [tidyverse] offers a special case of R's default data frame: the "tibble", which is a nod to the actual class of these objects, `tbl_df`. If you have not already done so, install the tidyverse meta-package now: @@ -164,7 +164,7 @@ The __levels__ of the factor `continent` are "Africa", "Americas", etc. and this str(gapminder$continent) ``` -This [Janus][wiki-janus]-like nature of factors means they are rich with booby traps for the unsuspecting but they are a necessary evil. I recommend you resolve to learn how to [properly care and feed for factors](#factors-boss). The pros far outweigh the cons. Specifically in modelling and figure-making, factors are anticipated and accommodated by the functions and packages you will want to exploit. +This [Janus]-like nature of factors means they are rich with booby traps for the unsuspecting but they are a necessary evil. I recommend you resolve to learn how to [properly care and feed for factors](#factors-boss). The pros far outweigh the cons. Specifically in modelling and figure-making, factors are anticipated and accommodated by the functions and packages you will want to exploit. Here we count how many observations are associated with each continent and, as usual, try to portray that info visually. This makes it much easier to quickly see that African countries are well represented in this dataset. ```{r tabulate-continent} @@ -172,7 +172,7 @@ table(gapminder$continent) barplot(table(gapminder$continent)) ``` -In the figures below, we see how factors can be put to work in figures. The `continent` factor is easily mapped into "facets" or colors and a legend by the ggplot2 package. *Making figures with ggplot2 is covered in Chapter \@ref(ggplot2-tutorial) so feel free to just sit back and enjoy these plots or blindly copy/paste.* +In the figures below, we see how factors can be put to work in figures. The `continent` factor is easily mapped into "facets" or colors and a legend by the [ggplot2] package. *Making figures with ggplot2 is covered in Chapter \@ref(ggplot2-tutorial) so feel free to just sit back and enjoy these plots or blindly copy/paste.* ```{r factors-nice-for-plots, fig.show = 'hold', out.width = '49%'} ## we exploit the fact that ggplot2 was installed and loaded via the tidyverse @@ -221,7 +221,7 @@ plot(lifeExp ~ log(gdpPercap), gapminder, subset = year == 2007) * Use data frames!!! -* Use the [tidyverse][tidyverse-web]!!! This will provide a special type of data frame called a "tibble" that has nice default printing behavior, among other benefits. +* Use the [tidyverse]!!! This will provide a special type of data frame called a "tibble" that has nice default printing behavior, among other benefits. * When in doubt, `str()` something or print something. diff --git a/06_dplyr-intro.Rmd b/06_dplyr-intro.Rmd index 0986359..7238a47 100644 --- a/06_dplyr-intro.Rmd +++ b/06_dplyr-intro.Rmd @@ -8,9 +8,9 @@ source("common.R") ## Intro -[dplyr][dplyr-web] is a package for data manipulation, developed by Hadley Wickham and Romain Francois. It is built to be fast, highly expressive, and open-minded about how your data is stored. It is installed as part of the [tidyverse][tidyverse-web] meta-package and, as a core package, it is among those loaded via `library(tidyverse)`. +[dplyr] is a package for data manipulation, developed by Hadley Wickham and Romain Francois. It is built to be fast, highly expressive, and open-minded about how your data is stored. It is installed as part of the [tidyverse] meta-package and, as a core package, it is among those loaded via `library(tidyverse)`. -dplyr's roots are in an earlier package called [plyr][plyr-web], which implements the ["split-apply-combine" strategy for data analysis][split-apply-combine] [@wickham2011a]. Where plyr covers a diverse set of inputs and outputs (e.g., arrays, data frames, lists), dplyr has a laser-like focus on data frames or, in the tidyverse, "tibbles". dplyr is a package-level treatment of the `ddply()` function from plyr, because "data frame in, data frame out" proved to be so incredibly important. +dplyr's roots are in an earlier package called [plyr], which implements the ["split-apply-combine" strategy for data analysis](https://www.jstatsoft.org/article/view/v040i01) [@wickham2011a]. Where plyr covers a diverse set of inputs and outputs (e.g., arrays, data frames, lists), dplyr has a laser-like focus on data frames or, in the tidyverse, "tibbles". dplyr is a package-level treatment of the `ddply()` function from plyr, because "data frame in, data frame out" proved to be so incredibly important. Have no idea what I'm talking about? Not sure if you care? If you use these base R functions: `subset()`, `apply()`, `[sl]apply()`, `tapply()`, `aggregate()`, `split()`, `do.call()`, `with()`, `within()`, then you should keep reading. Also, if you use `for()` loops a lot, you might enjoy learning other ways to iterate over rows or groups of rows or variables in a data frame. @@ -107,7 +107,7 @@ This call explains itself and is fairly robust. ## Meet the new pipe operator -Before we go any further, we should exploit the new pipe operator that the tidyverse imports from the [magrittr][magrittr-web] package by Stefan Bache. This is going to change your data analytical life. You no longer need to enact multi-operation commands by nesting them inside each other, like so many [Russian nesting dolls][wiki-nesting-dolls]. This new syntax leads to code that is much easier to write and to read. +Before we go any further, we should exploit the new pipe operator that the tidyverse imports from the [magrittr] package by Stefan Bache. This is going to change your data analytical life. You no longer need to enact multi-operation commands by nesting them inside each other, like so many [Russian nesting dolls](https://en.wikipedia.org/wiki/Matryoshka_doll). This new syntax leads to code that is much easier to write and to read. Here's what it looks like: `%>%`. The RStudio keyboard shortcut: Ctrl+Shift+M (Windows), Cmd+Shift+M (Mac). @@ -168,7 +168,7 @@ gapminder[gapminder$country == "Cambodia", c("year", "lifeExp")] We've barely scratched the surface of dplyr but I want to point out key principles you may start to appreciate. If you're new to R or "programming with data", feel free skip this section and [move on](#dplyr-single). -dplyr's verbs, such as `filter()` and `select()`, are what's called [pure functions][wiki-pure-fxns]. To quote from Wickham's [Advanced R Programming book][adv-r-fxns] [-@wickham2015a]: +dplyr's verbs, such as `filter()` and `select()`, are what's called [pure functions](https://en.wikipedia.org/wiki/Pure_function). To quote from the [Functions chapter](http://adv-r.had.co.nz/Functions.html) of Wickham's [Advanced R] book [-@wickham2015a]: > The functions that are the easiest to understand and reason about are pure functions: functions that always map the same input to the same output and have no other impact on the workspace. In other words, pure functions have no side effects: they don’t affect the state of the world in any way apart from the value they return. @@ -176,29 +176,29 @@ In fact, these verbs are a special case of pure functions: they take the same fl And finally, the data is __always__ the very first argument of the verb functions. -This set of deliberate design choices, together with the new pipe operator, produces a highly effective, low friction [domain-specific language][adv-r-dsl] for data analysis. +This set of deliberate design choices, together with the new pipe operator, produces a highly effective, low friction [domain-specific language](http://adv-r.had.co.nz/dsl.html) for data analysis. Go to the next Chapter, [dplyr functions for a single dataset](#dplyr-single), for more dplyr! ## Resources -dplyr official stuff: +[dplyr] official stuff: * Package home [on CRAN][dplyr-cran]. - - Note there are several vignettes, with the [introduction][dplyr-vignette-intro] being the most relevant right now. - - The [one on window functions][dplyr-vignette-window-fxns] will also be interesting to you now. + - Note there are several vignettes, with the [Introduction to dplyr] being the most relevant right now. + - The [Window functions] one will also be interesting to you now. * Development home [on GitHub][dplyr-github]. * [Tutorial HW delivered][useR-2014-dropbox] (note this links to a DropBox folder) at useR! 2014 conference. -[RStudio Data Transformation Cheat Sheet][rstudio-dplyr-cheatsheet-download], covering dplyr. Remember you can get to these via *Help > Cheatsheets.* +[RStudio Data Transformation Cheat Sheet], covering dplyr. Remember you can get to these via *Help > Cheatsheets.* -[Data transformation][r4ds-transform] chapter of [R for Data Science][r4ds] [@wickham2016]. +[Data transformation][r4ds-transform] chapter of [R for Data Science] [@wickham2016]. -[Excellent slides][tj-mahr-slides] on pipelines and dplyr by TJ Mahr, talk given to the Madison R Users Group. +["Let the Data Flow: Pipelines in R with dplyr and magrittr"] - Excellent slides on pipelines and dplyr by TJ Mahr, talk given to the Madison R Users Group. -Blog post [Hands-on dplyr tutorial for faster data manipulation in R][dataschool-dplyr] by Data School, that includes a link to an R Markdown document and links to videos. +Blog post ["Hands-on dplyr tutorial for faster data manipulation in R"] by Data School, that includes a link to an R Markdown document and links to videos. Chapter \@ref(join-cheatsheet): cheatsheet I made for dplyr join functions (not relevant yet but soon). diff --git a/07_dplyr-single-table.Rmd b/07_dplyr-single-table.Rmd index e1410b7..3d767e4 100644 --- a/07_dplyr-single-table.Rmd +++ b/07_dplyr-single-table.Rmd @@ -16,7 +16,7 @@ In Chapter \@ref(dplyr-intro), [Introduction to dplyr](#dplyr-intro), we used tw We also discussed dplyr's role inside the tidyverse and tibbles: -* dplyr is a core package in the [tidyverse][tidyverse-github] meta-package. Since we often make incidental usage of the others, we will load dplyr and the others via `library(tidyverse)`. +* dplyr is a core package in the [tidyverse] meta-package. Since we often make incidental usage of the others, we will load dplyr and the others via `library(tidyverse)`. * The tidyverse embraces a special flavor of data frame, called a tibble. The `gapminder` dataset is stored as a tibble. ## Load dplyr and gapminder @@ -66,7 +66,7 @@ my_gap %>% mutate(gdp = pop * gdpPercap) ``` -Hmmmm ... those GDP numbers are almost uselessly large and abstract. Consider the [advice of Randall Munroe of xkcd][xckd-randall-munroe]: +Hmmmm ... those GDP numbers are almost uselessly large and abstract. Consider the [advice of Randall Munroe of xkcd](https://fivethirtyeight.com/features/xkcd-randall-munroe-qanda-what-if/): >One thing that bothers me is large numbers presented without context... 'If I added a zero to this number, would the sentence containing it mean something different to me?' If the answer is 'no,' maybe the number has no business being in the sentence in the first place." @@ -143,7 +143,7 @@ I advise that your analyses NEVER rely on rows or variables being in a specific ## Use `rename()` to rename variables -When I first cleaned this Gapminder excerpt, I was a [`camelCase`][wiki-camel-case] person, but now I'm all about [`snake_case`][wiki-snake-case]. So I am vexed by the variable names I chose when I cleaned this data years ago. Let's rename some variables! +When I first cleaned this Gapminder excerpt, I was a [`camelCase`](https://en.wikipedia.org/wiki/Camel_case) person, but now I'm all about [`snake_case`][wiki-snake-case]. So I am vexed by the variable names I chose when I cleaned this data years ago. Let's rename some variables! ```{r} my_gap %>% @@ -181,7 +181,7 @@ dplyr offers powerful tools to solve this class of problem: * `summarize()` takes a dataset with $n$ observations, computes requested summaries, and returns a dataset with 1 observation. * Window functions take a dataset with $n$ observations and return a dataset with $n$ observations. * `mutate()` and `summarize()` will honor groups. -* You can also do very general computations on your groups with `do()`, though elsewhere in this course, I advocate for other approaches that I find more intuitive, using the [purrr package][purrr-web]. +* You can also do very general computations on your groups with `do()`, though elsewhere in this course, I advocate for other approaches that I find more intuitive, using the [purrr] package. Combined with the verbs you already know, these new tools allow you to solve an extremely diverse set of problems with relative ease. @@ -362,29 +362,29 @@ my_gap %>% Ponder that for a while. The subject matter and the code. Mostly you're seeing what genocide looks like in dry statistics on average life expectancy. -Break the code into pieces, starting at the top, and inspect the intermediate results. That's certainly how I was able to *write* such a thing. These commands do not [leap fully formed out of anyone's forehead][athena-zeus-forehead] -- they are built up gradually, with lots of errors and refinements along the way. I'm not even sure it's a great idea to do so much manipulation in one fell swoop. Is the statement above really hard for you to read? If yes, then by all means break it into pieces and make some intermediate objects. Your code should be easy to write and read when you're done. +Break the code into pieces, starting at the top, and inspect the intermediate results. That's certainly how I was able to *write* such a thing. These commands do not [leap fully formed out of anyone's forehead](https://tinyurl.com/athenaforehead) -- they are built up gradually, with lots of errors and refinements along the way. I'm not even sure it's a great idea to do so much manipulation in one fell swoop. Is the statement above really hard for you to read? If yes, then by all means break it into pieces and make some intermediate objects. Your code should be easy to write and read when you're done. In later tutorials, we'll explore more of dplyr, such as operations based on two datasets. ## Resources -dplyr official stuff: +[dplyr] official stuff: * Package home [on CRAN][dplyr-cran]. - - Note there are several vignettes, with the [introduction][dplyr-vignette-intro] being the most relevant right now. - - The [one on window functions][dplyr-vignette-window-fxns] will also be interesting to you now. + - Note there are several vignettes, with the [Introduction to dplyr] being the most relevant right now. + - The [Window functions] one will also be interesting to you now. * Development home [on GitHub][dplyr-github]. * [Tutorial HW delivered][useR-2014-dropbox] (note this links to a DropBox folder) at useR! 2014 conference. -[RStudio Data Transformation Cheat Sheet][rstudio-dplyr-cheatsheet-download], covering dplyr. Remember you can get to these via *Help > Cheatsheets.* +[RStudio Data Transformation Cheat Sheet], covering dplyr. Remember you can get to these via *Help > Cheatsheets.* -[Data transformation][r4ds-transform] chapter of [R for Data Science][r4ds] [@wickham2016]. +[Data transformation][r4ds-transform] chapter of [R for Data Science] [@wickham2016]. -[Excellent slides][tj-mahr-slides] on pipelines and dplyr by TJ Mahr, talk given to the Madison R Users Group. +["Let the Data Flow: Pipelines in R with dplyr and magrittr"] - Excellent slides on pipelines and dplyr by TJ Mahr, talk given to the Madison R Users Group. -Blog post [Hands-on dplyr tutorial for faster data manipulation in R][dataschool-dplyr] by Data School, that includes a link to an R Markdown document and links to videos. +Blog post ["Hands-on dplyr tutorial for faster data manipulation in R"] by Data School, that includes a link to an R Markdown document and links to videos. Chapter \@ref(join-cheatsheet): cheatsheet I made for dplyr join functions (not relevant yet but soon). diff --git a/08_tidy-data.Rmd b/08_tidy-data.Rmd index 12f3ed1..031723d 100644 --- a/08_tidy-data.Rmd +++ b/08_tidy-data.Rmd @@ -4,7 +4,7 @@ source("common.R") ``` -[Tidy data using Lord of the Rings][tidydata-lotr]: tidy data, tidyr. +[Tidy data using Lord of the Rings]: tidy data, [tidyr]. ```{r links, child="links.md"} diff --git a/09_import-export.Rmd b/09_import-export.Rmd index b82f3b5..67c7582 100644 --- a/09_import-export.Rmd +++ b/09_import-export.Rmd @@ -8,7 +8,7 @@ source("common.R") ## File I/O overview -We've been loading the Gapminder data as a data frame from the gapminder data package. We haven't been explicitly writing any data or derived results to file. In real life, you'll bring rectangular data into and out of R all the time. Sometimes you'll need to do same for non-rectangular objects. +We've been loading the Gapminder data as a data frame from the [gapminder] data package. We haven't been explicitly writing any data or derived results to file. In real life, you'll bring rectangular data into and out of R all the time. Sometimes you'll need to do same for non-rectangular objects. How do you do this? What issues should you think about? @@ -32,11 +32,11 @@ First tip: __today's outputs are tomorrow's inputs__. Think back on all the pain Second tip: don't be too cute or clever. A plain text file that is readable by a human being in a text editor should be your default until you have __actual proof__ that this will not work. Reading and writing to exotic or proprietary formats will be the first thing to break in the future or on a different computer. It also creates barriers for anyone who has a different toolkit than you do. Be software-agnostic. Aim for future-proof and moron-proof. -How does this fit with our emphasis on dynamic reporting via R Markdown? There is a time and place for everything. There are projects and documents where the scope and personnel will allow you to geek out with knitr and R Markdown. But there are lots of good reasons why (parts of) an analysis should not (only) be embedded in a dynamic report. Maybe you are just doing data cleaning to produce a valid input dataset. Maybe you are making a small but crucial contribution to a giant multi-author paper. Etc. Also remember there are other tools and workflows for making something reproducible. I'm looking at you, [make][minimal-make]. +How does this fit with our emphasis on dynamic reporting via R Markdown? There is a time and place for everything. There are projects and documents where the scope and personnel will allow you to geek out with knitr and R Markdown. But there are lots of good reasons why (parts of) an analysis should not (only) be embedded in a dynamic report. Maybe you are just doing data cleaning to produce a valid input dataset. Maybe you are making a small but crucial contribution to a giant multi-author paper. Etc. Also remember there are other tools and workflows for making something reproducible. I'm looking at you, [make](https://kbroman.org/minimal_make/). ## Load the tidyverse -The main package we will be using is [readr][readr-web], which provides drop-in substitute functions for `read.table()` and friends. However, to make some points about data export and import, it is nice to reorder factor levels. For that, we will use the [forcats][forcats-web] package, which is also included in the tidyverse package. +The main package we will be using is [readr], which provides drop-in substitute functions for `read.table()` and friends. However, to make some points about data export and import, it is nice to reorder factor levels. For that, we will use the [forcats] package, which is also included in the [tidyverse] package. ```{r start_import_export} library(tidyverse) @@ -44,7 +44,7 @@ library(tidyverse) ## Locate the Gapminder data -We could load the data from the package as usual, but instead we will load it from tab delimited file. The gapminder package includes the data normally found in the `gapminder` data frame as a `.tsv`. So let's get the path to that file on *your* system using the [`fs` package][fs-web]. +We could load the data from the package as usual, but instead we will load it from tab delimited file. The gapminder package includes the data normally found in the `gapminder` data frame as a `.tsv`. So let's get the path to that file on *your* system using the [fs] package. ```{r} library(fs) @@ -77,7 +77,7 @@ str(gapminder) Default to `readr::read_delim()` and friends. Use the arguments! -The Gapminder data is too clean and simple to show off the great features of readr, so I encourage you to check out the part of the introduction vignette on [column types][readr-vignette-intro]. There are many variable types that you will be able to parse correctly upon import, thereby eliminating a great deal of post-import fussing. +The Gapminder data is too clean and simple to show off the great features of readr, so I encourage you to check out the part of the introduction vignette on [column types](https://cloud.r-project.org/web/packages/readr/vignettes/readr.html). There are many variable types that you will be able to parse correctly upon import, thereby eliminating a great deal of post-import fussing. ## Compute something worthy of export @@ -111,7 +111,7 @@ Let's look at the first few lines of `gap_life_exp.csv`. If you're following alo This is pretty decent looking, though there is no visible alignment or separation into columns. Had we used the base function `read.csv()`, we would be seeing rownames and lots of quotes, unless we had explicitly shut that down. Nicer default behavior is the main reason we are using `readr::write_csv()` over `write.csv()`. -* It's not really fair to complain about the lack of visible alignment. Remember we are ["writing data for computers"][write-data-tweet]. If you really want to browse around the file, use `View()` in RStudio or open it in Microsoft Excel (!) but don't succumb to the temptation to start doing artisanal data manipulations there ... get back to R and construct commands that you can re-run the next 15 times you import/clean/aggregate/export the same dataset. Trust me, it will happen. +* It's not really fair to complain about the lack of visible alignment. Remember we are ["writing data for computers"]. If you really want to browse around the file, use `View()` in RStudio or open it in Microsoft Excel (!) but don't succumb to the temptation to start doing artisanal data manipulations there ... get back to R and construct commands that you can re-run the next 15 times you import/clean/aggregate/export the same dataset. Trust me, it will happen. ## Invertibility @@ -124,9 +124,9 @@ It turns out these self-imposed rules are often in conflict with one another: __Example:__ after performing the country-level summarization, we reorder the levels of the country factor, based on life expectancy. This reordering operation is conceptually important and must be embodied in R commands stored in a script. However, as soon as we write `gap_life_exp` to a plain text file, that meta-information about the countries is lost. Upon re-import with `read_delim()` and friends, we are back to alphabetically ordered factor levels. Any measure we take to avoid this loss immediately breaks another one of our rules. -So what do I do? I must admit I save (and re-load) R-specific binary files. Right after I save the plain text file. [Belt and suspenders][belt-and-suspenders]. +So what do I do? I must admit I save (and re-load) R-specific binary files. Right after I save the plain text file. [Belt and suspenders](https://www.wisegeek.com/what-does-it-mean-to-wear-belt-and-suspenders.htm). -I have toyed with the idea of writing import helper functions for a specific project, that would re-order factor levels in principled ways. They could be defined in one file and called from many. This would also have a very natural implementation within [a workflow where each analytical project is an R package][research-workflow]. But so far it has seemed too much like [yak shaving][yak-shaving]. I'm intrigued by a recent discussion of putting such information in YAML frontmatter (see Martin Fenner blog post [Using YAML frontmatter with CSV][yaml-with-csv]). +I have toyed with the idea of writing import helper functions for a specific project, that would re-order factor levels in principled ways. They could be defined in one file and called from many. This would also have a very natural implementation within [a workflow where each analytical project is an R package](https://www.carlboettiger.info/2012/05/06/research-workflow.html). But so far it has seemed too much like [yak shaving](https://seths.blog/2005/03/dont_shave_that/). I'm intrigued by a recent discussion of putting such information in YAML frontmatter (see Martin Fenner blog post [Using YAML frontmatter with CSV](https://blog.datacite.org/using-yaml-frontmatter-with-csv/)). ## Reordering the levels of the country factor @@ -204,7 +204,7 @@ Now let's look at the first few lines of the file `gap_life_exp-dput.txt`. cat(sep = "\n") ``` -Huh? Don't worry about it. Remember we are ["writing data for computers"][write-data-tweet]. The partner function `dget()` reads this representation back in. +Huh? Don't worry about it. Remember we are ["writing data for computers"]. The partner function `dget()` reads this representation back in. ```{r} gap_life_exp_dget <- dget("gap_life_exp-dput.txt") @@ -217,7 +217,7 @@ Note how the original, post-reordering country factor levels are restored using But why on earth would you ever do this? -The main application of this is [the creation of highly portable, self-contained minimal examples][reproducible-examples]. For example, if you want to pose a question on a forum or directly to an expert, it might be required or just plain courteous to NOT attach any data files. You will need a monolithic, plain text blob that defines any necessary objects and has the necessary code. `dput()` can be helpful for producing the piece of code that defines the object. If you `dput()` without specifying a file, you can copy the return value from Console and paste into a script. Or you can write to file and copy from there or add R commands below. +The main application of this is [the creation of highly portable, self-contained minimal examples](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). For example, if you want to pose a question on a forum or directly to an expert, it might be required or just plain courteous to NOT attach any data files. You will need a monolithic, plain text blob that defines any necessary objects and has the necessary code. `dput()` can be helpful for producing the piece of code that defines the object. If you `dput()` without specifying a file, you can copy the return value from Console and paste into a script. Or you can write to file and copy from there or add R commands below. ## Other types of objects to use `dput()` or `saveRDS()` on @@ -241,7 +241,7 @@ Sometimes, instead of rigid tab-delimiting, whitespace is used as the delimiter. ## Resources -[Data import](http://r4ds.had.co.nz/data-import.html) chapter of [R for Data Science][r4ds] by Hadley Wickham and Garrett Grolemund [-@wickham2016]. +[Data import](http://r4ds.had.co.nz/data-import.html) chapter of [R for Data Science] by Hadley Wickham and Garrett Grolemund [-@wickham2016]. White et al.'s "Nine simple ways to make it easier to (re)use your data" [-@white2013]. @@ -261,5 +261,9 @@ Data Manipulation in R by Phil Spector [-@spector2008]. * See Chapter 2 ("Reading and Writing Data") + +["writing data for computers"]: https://twitter.com/vsbuffalo/statuses/358699162679787521 + + ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/10_factors.Rmd b/10_factors.Rmd index 814d4e3..95af3d0 100644 --- a/10_factors.Rmd +++ b/10_factors.Rmd @@ -14,7 +14,7 @@ We've spent a lot of time working with big, beautiful data frames, like the Gapm Factors are the variable type that useRs love to hate. It is how we store truly categorical information in R. The values a factor can take on are called the **levels**. For example, the levels of the factor `continent` in Gapminder are are "Africa", "Americas", etc. and this is what's usually presented to your eyeballs by R. In general, the levels are friendly human-readable character strings, like "male/female" and "control/treated". But *never ever ever* forget that, under the hood, R is really storing integer codes 1, 2, 3, etc. -This [Janus][wiki-janus]-like nature of factors means they are rich with booby traps for the unsuspecting but they are a necessary evil. I recommend you learn how to be the boss of your factors. The pros far outweigh the cons. Specifically in modelling and figure-making, factors are anticipated and accommodated by the functions and packages you will want to exploit. +This [Janus]-like nature of factors means they are rich with booby traps for the unsuspecting but they are a necessary evil. I recommend you learn how to be the boss of your factors. The pros far outweigh the cons. Specifically in modelling and figure-making, factors are anticipated and accommodated by the functions and packages you will want to exploit. **The worst kind of factor is the stealth factor.** The variable that you think of as character, but that is actually a factor (numeric!!). This is a classic R gotcha. Check your variable types explicitly when things seem weird. It happens to the best of us. @@ -22,12 +22,12 @@ Where do stealth factors come from? Base R has a burning desire to turn characte Good articles about how the factor fiasco came to be: -* [stringsAsFactors: An unauthorized biography][bio-strings-as-factors] by Roger Peng -* [stringsAsFactors = \][blog-strings-as-factors] by Thomas Lumley +* [stringsAsFactors: An unauthorized biography](https://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography) by Roger Peng +* [stringsAsFactors = \](https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) by Thomas Lumley ## The forcats package -[forcats][forcats-web] is a core package in the tidyverse. It is installed via `install.packages("tidyverse")`, and loaded with `library(tidyverse)`. You can also install via `install.packages("forcats")`and load it yourself separately as needed via `library(forcats)`. Main functions start with `fct_`. There really is no coherent family of base functions that forcats replaces -- that's why it's such a welcome addition. +[forcats] is a core package in the [tidyverse]. It is installed via `install.packages("tidyverse")`, and loaded with `library(tidyverse)`. You can also install via `install.packages("forcats")`and load it yourself separately as needed via `library(forcats)`. Main functions start with `fct_`. There really is no coherent family of base functions that forcats replaces -- that's why it's such a welcome addition. Currently this lesson will be mostly code vs prose. See the previous lesson for more discussion during the transition. diff --git a/11_character-vectors.Rmd b/11_character-vectors.Rmd index b97fdb8..535931d 100644 --- a/11_character-vectors.Rmd +++ b/11_character-vectors.Rmd @@ -12,9 +12,9 @@ We've spent a lot of time working with big, beautiful data frames. That are clea But real life will be much nastier. You will bring data into R from the outside world and discover there are problems. You might think: how hard can it be to deal with character data? And the answer is: it can be very hard! -* [Stack Exchange outage][stackexchange-outage] -* [Regexes to validate/match email addresses][email-regex] -* [Fixing an Atom bug][fix-atom-bug] +* [Stack Exchange outage](https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016) +* [Regexes to validate/match email addresses](https://emailregex.com) +* [Fixing an Atom bug](https://davidvgalbraith.com/how-i-fixed-atom/) Here we discuss common remedial tasks for cleaning and transforming character data, also known as "strings". A data frame or tibble will consist of one or more *atomic vectors* of a certain class. This lesson deals with things you can do with vectors of class `character`. @@ -24,46 +24,46 @@ I start with this because we cannot possibly do this topic justice in a short am ### Manipulating character vectors -* [stringr package][stringr-web]. - - A core package in the `tidyverse.` It is installed via `install.packages("tidyverse")` and also loaded via `library(tidyverse)`. Of course, you can also install or load it individually. +* [stringr] package. + - A core package in the [tidyverse]. It is installed via `install.packages("tidyverse")` and also loaded via `library(tidyverse)`. Of course, you can also install or load it individually. - Main functions start with `str_`. Auto-complete is your friend. - Replacements for base functions re: string manipulation and regular expressions (see below). - Main advantages over base functions: greater consistency about inputs and outputs. Outputs are more ready for your next analytical task. -* [tidyr package][tidyr-web]. +* [tidyr] package. - Especially useful for functions that split one character vector into many and *vice versa*: `separate()`, `unite()`, `extract()`. * Base functions: `nchar()`, `strsplit()`, `substr()`, `paste()`, `paste0()`. -* The [glue package][glue-web] is fantastic for string interpolation. If `stringr::str_interp()` doesn't get your job done, check out the glue package. +* The [glue] package is fantastic for string interpolation. If `stringr::str_interp()` doesn't get your job done, check out the glue package. ### Regular expressions resources A God-awful and powerful language for expressing patterns to match in text or for search-and-replace. Frequently described as "write only", because regular expressions are easier to write than to read/understand. And they are not particularly easy to write. -* We again prefer the [stringr package][stringr-cran] over base functions. Why? - - Wraps [stringi][stringi-cran], which is a great place to look if stringr isn't powerful enough. - - Standardized on [ICU regular expressions][icu-regex], so you can stop toggling `perl = TRUE/FALSE` at random. +* We again prefer the [stringr] package over base functions. Why? + - Wraps [stringi], which is a great place to look if stringr isn't powerful enough. + - Standardized on [ICU regular expressions](https://userguide.icu-project.org/strings/regexp), so you can stop toggling `perl = TRUE/FALSE` at random. - Results come back in a form that is much friendlier for downstream work. -* The [Strings chapter][r4ds-strings] of [R for Data Science][r4ds] [@wickham2016] is a great resource. +* The [Strings chapter] of [R for Data Science] [@wickham2016] is a great resource. * Older STAT 545 lessons on regular expressions have some excellent content. This lesson draws on them, but makes more rigorous use of stringr and uses example data that is easier to support long-term. - [2014 Intro to regular expressions](#oldies) by TA Gloria Li (Appendix \@ref(oldies)). - [2015 Regular expressions and character data in R](#oldies) by TA Kieran Samuk (Appendix \@ref(oldies)). -* RStudio Cheat Sheet on [Regular Expressions in R][rstudio-regex-cheatsheet]. +* [Regular Expressions in R Cheat Sheet] by Ian Kopacka. * Regex testers: - - [regex101.com][regex101] - - [regexr.com][regexr] -* [`rex` R package][rex-github]: make regular expression from human readable expressions. + - [regex101.com](https://regex101.com) + - [regexr.com](https://regexr.com) +* [rex] R package: make regular expression from human readable expressions. * Base functions: `grep()` and friends. ### Character encoding resources -* [Strings subsection of data import chapter][r4ds-readr-strings] in [R for Data Science][r4ds] [@wickham2016]. +* [Strings subsection of data import chapter][r4ds-readr-strings] in [R for Data Science] [@wickham2016]. * Screeds on the Minimum Everyone Needs to Know about encoding: - - [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)][unicode-no-excuses] - - [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text][programmers-encoding] -* Chapter \@ref(character-encoding) - I've translated this blog post [Guide to fixing encoding problems in Ruby][encoding-probs-ruby] into R as the first step to developing a lesson. + - ["The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"] + - ["What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text"] +* Chapter \@ref(character-encoding) - I've translated this blog post [Guide to fixing encoding problems in Ruby] into R as the first step to developing a lesson. ### Character vectors that live in a data frame -* Certain operations are facilitated by tidyr. These are described below. +* Certain operations are facilitated by [tidyr]. These are described below. * For a general discussion of how to work on variables that live in a data frame, see [Vectors versus tibbles](#oldies) (Appendix \@ref(oldies)). ## Load the tidyverse, which includes stringr @@ -391,7 +391,11 @@ You can use parentheses inside regexes to define *groups* and you can refer to t For now, this lesson will refer you to other place to read up on this: * STAT 545 [2014 Intro to regular expressions](#oldies) by TA Gloria Li (Appendix \@ref(oldies)). -* The [Strings chapter][r4ds-strings] of [R for Data Science][r4ds] [@wickham2016]. +* The [Strings chapter] of [R for Data Science] [@wickham2016]. + + +[Strings chapter]: https://r4ds.had.co.nz/strings.html + ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/12_character-encoding.Rmd b/12_character-encoding.Rmd index b02e9d9..50861e2 100644 --- a/12_character-encoding.Rmd +++ b/12_character-encoding.Rmd @@ -10,10 +10,10 @@ source("common.R") ## Resources -* [Strings subsection of data import chapter][r4ds-readr-strings] in R for Data Science [@wickham2016]. +* [Strings subsection of data import chapter][r4ds-readr-strings] in [R for Data Science] [@wickham2016]. * Screeds on the Minimum Everyone Needs to Know about encoding: - - [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)][unicode-no-excuses] - - [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text][programmers-encoding] + - ["The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"] + - ["What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text"] * Debugging charts: - [Windows-1252 Characters to UTF-8 Bytes to Latin-1 Characters][utf8-debug] * Character inspection: @@ -23,8 +23,8 @@ source("common.R") For now, this page walks through these two mini-tutorials (written for Ruby), but translated to R: -* [Guide to fixing encoding problems in Ruby][encoding-probs-ruby] -* [How to Get From They’re to They’re][theyre-to-theyre] +* ["Guide to fixing encoding problems in Ruby"] +* ["How to Get From They’re to They’re"](https://www.justinweiss.com/articles/how-to-get-from-theyre-to-theyre/) Don't expect much creativity from me here. My goal is faithful translation. @@ -141,7 +141,7 @@ stringi::stri_enc_detect(string) Advice given in post is to sleuth it out based on where the data came from. With larger amounts of text, each language's guessing facilities presumably do better than they do here. In real life, all of this advice can prove to be ... overly optimistic? -I find it helpful to scrutinize debugging charts and look for the weird stuff showing up in my text. Here's [one that shows what UTF-8 bytes look like][utf8-debug] when erroneously interpreted under Windows-1252 encoding. This phenomenon is known as [*mojibake*][wiki-mojibake], which is a delightful word for a super-annoying phenomenon. If it helps, know that the most common encodings are UTF-8, ISO-8859-1 (or Latin1), and Windows-1252, so that really narrows things down. +I find it helpful to scrutinize debugging charts and look for the weird stuff showing up in my text. Here's [one that shows what UTF-8 bytes look like][utf8-debug] when erroneously interpreted under Windows-1252 encoding. This phenomenon is known as [*mojibake*](https://en.wikipedia.org/wiki/Mojibake), which is a delightful word for a super-annoying phenomenon. If it helps, know that the most common encodings are UTF-8, ISO-8859-1 (or Latin1), and Windows-1252, so that really narrows things down. ### Decide which encoding you want the string to be @@ -237,6 +237,8 @@ as.integer(charToRaw(backwards_one)) as.integer(charToRaw(string_curly)) ``` + +[utf8-debug]: http://www.i18nqa.com/debug/utf8-debug.html ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/13_date-times.Rmd b/13_date-times.Rmd index 21e65fa..0c59ab3 100644 --- a/13_date-times.Rmd +++ b/13_date-times.Rmd @@ -20,13 +20,18 @@ Here we discuss common remedial tasks for dealing with date-times. I start with this because we cannot possibly do this topic justice in a short amount of time. Our goal is to make you aware of specific problems and solutions. Once you have a character problem in real life, these resources will be extremely helpful as you delve deeper. -[Dates and times][r4ds-dates-times] chapter from [R for Data Science][r4ds] by Hadley Wickham and Garrett Grolemund [-@wickham2016]. See also the subsection on dates and times in the [Data import chapter][r4ds-data-import]. +* [Dates and times](https://r4ds.had.co.nz/dates-and-times.html) chapter from [R for Data Science] by Hadley Wickham and Garrett Grolemund [-@wickham2016]. + + See also the subsection on dates and times in the [Data import chapter](http://r4ds.had.co.nz/data-import.html). +* The [lubridate] package. + + On [CRAN](https://cloud.R-project.org/package=lubridate). + + On [GitHub](https://github.com/tidyverse/lubridate). + + Main vignette: [Do more with dates and times in R]). +* Grolemund and Wickham's paper on lubridate in the Journal of Statistical Software: ["Dates and Times Made Easy with lubridate"] [-@grolemund2011]. +* Exercises to push you to learn lubridate (*posts include links to answers!*) + + [Part 1](https://www.r-exercises.com/2016/08/15/dates-and-times-simple-and-easy-with-lubridate-part-1/) + + [Part 2](https://www.r-exercises.com/2016/08/29/dates-and-times-simple-and-easy-with-lubridate-exercises-part-2/) + + [Part 3](https://www.r-exercises.com/2016/10/04/dates-and-times-simple-and-easy-with-lubridate-exercises-part-3/) -The [lubridate][lubridate-web] package ([CRAN][lubridate-cran]; [GitHub][lubridate-github]; [main vignette][lubridate-vignette]). - -Grolemund and Wickham's paper on lubridate in the Journal of Statistical Software [-@grolemund2011]. - -Exercises to push you to learn lubridate: [part 1][lubridate-ex1], [part 2][lubridate-ex2], and [part 3][lubridate-ex3] *posts include links to answers!* ## Load the tidyverse and lubridate diff --git a/14_multiple-tibbles.Rmd b/14_multiple-tibbles.Rmd index 586840b..784b3df 100644 --- a/14_multiple-tibbles.Rmd +++ b/14_multiple-tibbles.Rmd @@ -12,12 +12,12 @@ We've covered many topics on how to manipulate and reshape a single data frame: * Chapter \@ref(basic-data-care) - Basic care and feeding of data in R + Data frames (and tibbles) are awesome. -* Chapter \@ref(dplyr-intro) - Introduction to dplyr +* Chapter \@ref(dplyr-intro) - Introduction to [dplyr] + Filter, select, the pipe. * Chapter \@ref(dplyr-single) - dplyr functions for a single dataset + Single table verbs. * Chapter \@ref(tidy-data) - Tidy data using Lord of the Rings - + Tidy data, tidyr. + + Tidy data, [tidyr]. + *This actually kicks off with a row bind operation, discussed below.* But what if your data arrives in many pieces? There are many good (and bad) reasons why this might happen. How do you get it into one big beautiful tibble? These tasks break down into 3 main classes: @@ -55,7 +55,7 @@ library(tidyverse) ### Row binding -We used word count data from the Lord of the Rings trilogy to explore the concept of tidy data. That kicked off with a quiet, successful row bind. Let's revisit that. +We used word count data from the Lord of the Rings trilogy to explore the concept of tidy data in Chapter \@ref(tidy-data). That kicked off with a quiet, successful row bind. Let's revisit that. Here's what a perfect row bind of three (untidy!) data frames looks like. @@ -153,7 +153,6 @@ Bottom line: Row bind when you need to, but inspect the results re: coercion. Co Visit Chapter \@ref(join-cheatsheet) to see concrete examples of all the joins implemented in dplyr, based on comic characters and publishers. - The most recent release of gapminder includes a new data frame, `country_codes`, with country names and ISO codes. Therefore you can also use it to practice joins. ```{r end_multi_tibbles} diff --git a/15_join-tibbles.Rmd b/15_join-tibbles.Rmd index 648cd19..99954d2 100644 --- a/15_join-tibbles.Rmd +++ b/15_join-tibbles.Rmd @@ -64,13 +64,13 @@ make_three_gt <- function(gt_left, gt_mid, gt_right, ...) { ## Why the cheatsheet -Examples for those of us who don't speak SQL so good. There are lots of [Venn diagrams re: SQL joins on the internet][google-sql-join], but I wanted R examples. Those diagrams also utterly fail to show what's really going on vis-a-vis rows AND columns. +Examples for those of us who don't speak SQL so good. There are lots of [Venn diagrams re: SQL joins on the internet](https://www.google.com/search?q=sql+join&tbm=isch), but I wanted R examples. Those diagrams also utterly fail to show what's really going on vis-a-vis rows AND columns. Other great places to read about joins: -* The dplyr vignette on [Two-table verbs][dplyr-vignette-two-table]. -* The [Relational data chapter][r4ds-relational-data] in [R for Data Science][r4ds] [@wickham2016]. Excellent diagrams. +* The dplyr vignette on [Two-table verbs](https://dplyr.tidyverse.org/articles/two-table.html). +* The [Relational data chapter](https://r4ds.had.co.nz/relational-data.html) in [R for Data Science] [@wickham2016]. Excellent diagrams. ## The data diff --git a/17_r-objects-indexing.Rmd b/17_r-objects-indexing.Rmd index 8c6a117..4c65973 100644 --- a/17_r-objects-indexing.Rmd +++ b/17_r-objects-indexing.Rmd @@ -32,6 +32,7 @@ x[0] ``` R is built to work with vectors. Many operations are *vectorized*, i.e. by default they will happen component-wise when given a vector as input. Novices often don't internalize or exploit this and they write lots of unnecessary `for` loops. + ```{r} x <- 1:4 ## which would you rather write and read? @@ -46,11 +47,13 @@ identical(y, z) ``` When reading function documentation, keep your eyes peeled for arguments that can be vectors. You'll be surprised how common they are. For example, the mean and standard deviation of random normal variates can be provided as vectors. + ```{r} set.seed(1999) rnorm(5, mean = 10^(1:5)) round(rnorm(5, sd = 10^(0:4)), 2) ``` + This could be awesome in some settings, but dangerous in others, i.e. if you exploit this by mistake and get no warning. This is one of the reasons it's so important to keep close tabs on your R objects: are they what you expect in terms of their flavor and length or dimensions? Check early and check often. Notice that R also recycles vectors, if they are not the necessary length. You will get a warning if R suspects recycling is unintended, i.e. when one length is not an integer multiple of another, but recycling is silent if it seems like you know what you're doing. Can be a beautiful thing when you're doing this deliberately, but devastating when you don't. @@ -116,10 +119,10 @@ We've said, and even seen, that square brackets are used to index a vector. Ther Most common, useful ways to index a vector: -* __logical vector__: keep elements associated with TRUE's, ditch the FALSE's -* __vector of positive integers__: specifying the keepers -* __vector of negative integers__: specifying the losers -* __character vector__: naming the keepers +* __Logical vector__: keep elements associated with TRUE's, ditch the FALSE's +* __Vector of positive integers__: specifying the keepers +* __Vector of negative integers__: specifying the losers +* __Character vector__: naming the keepers ```{r} w @@ -134,14 +137,15 @@ w[-c(2, 5)] w[c('c', 'a', 'f')] ``` -## lists hold just about anything +## Lists hold just about anything Lists are basically über-vectors in R. It's like a vector, but with no requirement that the elements be of the same flavor. In data analysis, you won't make lists very often, at least not consciously, but you should still know about them. Why? * data.frames are lists! They are a special case where each element is an atomic vector, all having the same length. -* many functions will return lists to you and you will want to extract goodies from them, such as the p-value for a hypothesis test or the estimated error variance in a regression model +* Many functions will return lists to you and you will want to extract goodies from them, such as the p-value for a hypothesis test or the estimated error variance in a regression model Here we repeat an assignment from above, using `list()` instead of `c()` to combine things and you'll notice that the different flavors of the constituent parts are retained this time. + ```{r} ## earlier: a <- c("cabbage", pi, TRUE, 4.3) (a <- list("cabbage", pi, TRUE, 4.3)) @@ -152,6 +156,7 @@ class(a) ``` List components can also have names. You can create or change names after a list already exists or this can be integrated into the initial assignment. + ```{r} names(a) names(a) <- c("veg", "dessert", "myAim", "number") @@ -162,7 +167,7 @@ names(a) Indexing a list is similar to indexing a vector but it is necessarily more complex. The fundamental issue is this: if you request a single element from the list, do you want a list of length 1 containing only that element or do you want the element itself? For the former (desired return value is a list), we use single square brackets, `[` and `]`, just like indexing a vector. For the latter (desired return value is a single element), we use a dollar sign `$`, which you've already used to get one variable from a data.frame, or double square brackets, `[[` and `]]`. -The ["pepper shaker photos" in R for Data Science][r4ds-pepper-shaker] are a splendid visual explanation of the different ways to get stuff out of a list. Highly recommended. +The ["pepper shaker photos" in R for Data Science](https://r4ds.had.co.nz/vectors.html#lists-of-condiments) are a splendid visual explanation of the different ways to get stuff out of a list. Highly recommended. > Warning: the rest of this section might make your eyes glaze over. Skip to the next section if you need to; come back later when some list is ruining your day. @@ -180,6 +185,7 @@ mode(a) ``` Here's are ways to get a single list element: + ```{r error = TRUE} a[[2]] # index with a positive integer a$myAim # use dollar sign and element name @@ -190,6 +196,7 @@ iWantThis <- "joeNum" # indexing with length 1 character object a[[iWantThis]] # we get joeNum itself, a length 5 integer vector a[[c("joeNum", "veg")]] # does not work! can't get > 1 elements! see below ``` + A case when one must use the double bracket approach, as opposed to the dollar sign, is when the indexing object itself is an R object; we show that above. What if you want more than one element? You must index vector-style with single square brackets. Note that the return value will always be a list, unlike the return value from double square brackets, even if you only request 1 element. @@ -288,19 +295,23 @@ jMat[c("row1", "row4"), c("col2", "col3")] jMat[-c(2, 3), c(TRUE, TRUE, FALSE, FALSE)] # wacky but possible ``` -Under the hood, of course, matrices are just vectors with some extra facilities for indexing. R is a [column-major order][wiki-row-col-major-order] language, in contrast to C and Python which use row-major order. What this means is that in the underlying vector storage of a matrix, the columns are stacked up one after the other. Matrices can be indexed *exactly* like a vector, i.e. with no comma $i,j$ business, like so: +Under the hood, of course, matrices are just vectors with some extra facilities for indexing. R is a [column-major order](https://en.wikipedia.org/wiki/Row-_and_column-major_order) language, in contrast to C and Python which use row-major order. What this means is that in the underlying vector storage of a matrix, the columns are stacked up one after the other. Matrices can be indexed *exactly* like a vector, i.e. with no comma $i,j$ business, like so: + ```{r} jMat[7] jMat ``` + How to understand this: start counting in the upper left corner, move down the column, continue from the top of column 2 and you'll land on the element "x32" when you get to 7. If you have meaningful, systematic row or column names, there are many possibilities for indexing via regular expressions. Maybe we will talk about `grep` later.... + ```{r} jMat[1, grepl("[24]", colnames(jMat))] ``` Note also that one can put an indexed matrix on the receiving end of an assignment operation and, as long as your replacement values have valid shape or extent, you can change the matrix. + ```{r} jMat["row1", 2:3] <- c("HEY!", "THIS IS NUTS!") jMat @@ -361,13 +372,14 @@ str(multiMat) This behind the scenes tour is still aimed at making you a better data analyst. Hopefully the slog through vectors, matrices, and lists will be redeemed by greater prowess at manipulating data.frames. Why should this be true? -* a data.frame is a *list* -* the list elements are the variables and they are *atomic vectors* +* A data.frame is a *list* +* The list elements are the variables and they are *atomic vectors* * data.frames are rectangular, like their matrix friends, so your intuition -- and even some syntax -- can be borrowed from the matrix world A data.frame is a list that quacks like a matrix. Reviewing list-style indexing of a data.frame: + ```{r} jDat jDat$z @@ -377,6 +389,7 @@ str(jDat[[iWantThis]]) # we get an atomic vector ``` Reviewing vector-style indexing of a data.frame: + ```{r} jDat["y"] str(jDat["y"]) # we get a data.frame with one variable, y @@ -387,6 +400,7 @@ str(subset(jDat, select = c(w, v))) # using subset() function ``` Demonstrating matrix-style indexing of a data.frame: + ```{r end_indexing, tidy = FALSE} jDat[ , "v"] str(jDat[ , "v"]) @@ -433,7 +447,7 @@ subset(jDat, subset = z) +-----------+---------------+-----------+-----------+ ``` -Thinking about objects according to the flavors above will work fairly well for most purposes most of the time, at least when you're first getting started. Notice that most rows in the table are quite homogeneous, i.e. a logical vector is a logical vector is a logical vector. But the row pertaining to factors is an exception, which highlights the special nature of factors. (for more, go [here](#factors-boss)). +Thinking about objects according to the flavors above will work fairly well for most purposes most of the time, at least when you're first getting started. Notice that most rows in the table are quite homogeneous, i.e. a logical vector is a logical vector is a logical vector. But the row pertaining to factors is an exception, which highlights the special nature of factors. (For more, go [here](#factors-boss)). ## Accommodating color blindness -The dichromat package ([on CRAN][dichromat-cran]) will help you select a color scheme that will be effective for color blind readers. +The [dichromat] package will help you select a color scheme that will be effective for color blind readers. ```{r message = FALSE, warning = FALSE} # install.packages("dichromat") @@ -456,18 +457,22 @@ par(opar) ## Resources -* Zeileis et al.'s ["Escaping RGBland: Selecting Colors for Statistical Graphs"][escaping-rgbland-pdf] in [Computational Statistics & Data Analysis][escaping-rgbland-doi] [-@zeileis2009]. -* [Vignette][colorspace-vignette] for the [colorspace][colorspace-cran] package. +* Zeileis et al.'s ["Escaping RGBland: Selecting Colors for Statistical Graphs"] in Computational Statistics & Data Analysis [-@zeileis2009]. +* [Vignette](https://cloud.r-project.org/web/packages/colorspace/vignettes/hcl-colors.pdf) for the [colorspace] package. * Earl F. Glynn (Stowers Institute for Medical Research): - + [Excellent resources][stowers-color-chart] for named colors, i.e. the ones available via `colors()`. - + Informative talk ["Using Color in R"][stowers-using-color-in-R], though features some questionable *use* of color itself. -* Blog post [My favorite RGB color][favorite-rgb-color] on the Many World Theory blog. -* Wickham's [ggplot2: Elegant Graphics for Data Analysis][elegant-graphics-springer] [-@wickham2009]. - + [Online docs (nice!)][ggplot2-reference] - + [Package webpage][ggplot2-web] - + ggplot2 on [CRAN][ggplot2-cran] and [GitHub][ggplot2-github] + + [Excellent resources](https://web.archive.org/web/20121022044903/http://research.stowers-institute.org/efg/R/Color/Chart/) for named colors, i.e. the ones available via `colors()`. + + Informative talk ["Using Color in R"](https://www.uv.es/conesa/CursoR/material/UsingColorInR.pdf), though features some questionable *use* of color itself. +* Blog post ["My favorite RGB color"] on the Many World Theory blog. +* Wickham's [ggplot2: Elegant Graphics for Data Analysis] [-@wickham2009]. + + [Package webpage][ggplot2] + + ggplot2 on [CRAN](https://cloud.R-project.org/package=ggplot2) and [GitHub](https://github.com/tidyverse/ggplot2) + Section 6.4.3 Colour -* ["Why Should Engineers and Scientists Be Worried About Color?"][worry-about-color] by Bernice E. Rogowitz and Lloyd A. Treinish of IBM Research [-@rogowitz1996], h/t [\@EdwardTufte](https://twitter.com/EdwardTufte). +* ["Why Should Engineers and Scientists Be Worried About Color?"] by Bernice E. Rogowitz and Lloyd A. Treinish of IBM Research [-@rogowitz1996], h/t [\@EdwardTufte](https://twitter.com/EdwardTufte). + + + +[colorspace]: https://cloud.r-project.org/web/packages/colorspace/index.html +[dichromat]: https://cloud.R-project.org/package=dichromat ```{r links, child="links.md"} diff --git a/26_qualitative-colors.Rmd b/26_qualitative-colors.Rmd index 2fbeb2b..fe08deb 100644 --- a/26_qualitative-colors.Rmd +++ b/26_qualitative-colors.Rmd @@ -1,4 +1,4 @@ -# Taking control of qualitative colors in ggplot {#qualitative-colors} +# Taking control of qualitative colors in ggplot2 {#qualitative-colors} ```{r include = FALSE} source("common.R") @@ -8,7 +8,7 @@ source("common.R") ## Load packages and prepare the Gapminder data -Load the ggplot and dplyr packages and bring in the usual Gapminder data but drop Oceania, which only has two countries. +Load the ggplot2 and dplyr packages and bring in the usual Gapminder data but drop Oceania, which only has two countries. We also sort the country factor based on population and then sort the data as well. Why? In the bubble plots below, we don't want large countries to hide small countries. This is a case where, sadly, the row order of the data truly affects the visual output. @@ -26,7 +26,7 @@ jdat <- gapminder %>% ## Take control of the size and color of points -Let's use ggplot to move towards the classic Gapminder bubble chart. Crawl then walk then run. +Let's use ggplot2 to move towards the classic Gapminder bubble chart. Crawl then walk then run. First, make a simple scatterplot for a single year. @@ -72,7 +72,7 @@ r + aes(fill = continent) The gapminder package comes with color palettes for the continents and the individual countries. For example, here's the country color scheme: -```{r echo = FALSE, fig.cap = "From [https://github.com/jennybc/gapminder](https://github.com/jennybc/gapminder)"} +```{r echo = FALSE, fig.cap = "From "} knitr::include_graphics("img/gapminder-color-scheme-ggplot2.png") ``` @@ -86,9 +86,9 @@ head(country_colors) __Note:__ The order of `country_colors` is not alphabetical. The countries are actually sorted by size (in which particular year, I don't recall) within continent, reflecting the logic by which the scheme was created. No problem. Ideally, nothing in your analysis should depend on row order, although that's not always possible in reality. -## Prepare the color scheme for use with ggplot +## Prepare the color scheme for use with ggplot2 -In a [grammar of graphics](https://vita.had.co.nz/papers/layered-grammar.html), a __scale__ controls the mapping from a variable in the data to an aesthetic [@wickham2010]. So far we've let the coloring / filling scale be determined automatically by `ggplot2.` But to use our custom color scheme, we need to take control of the mapping of the `country` factor into fill color in `geom_point()`. +In a [grammar of graphics]["A layered grammar of graphics"], a __scale__ controls the mapping from a variable in the data to an aesthetic [@wickham2010]. So far we've let the coloring / filling scale be determined automatically by ggplot2. But to use our custom color scheme, we need to take control of the mapping of the `country` factor into fill color in `geom_point()`. We will use `scale_fill_manual()`, a member of a family of functions for customization of the discrete scales. The main argument is `values =`, which is a vector of aesthetic values -- fill colors, in our case. If this vector has names, they will be consulted during the mapping. This is incredibly useful! This is why `country_colors` does **exactly that**. This saves us from any worry about the order of levels of the `country` factor, the row order of the data, or exactly which countries are being plotted. diff --git a/27_secrets-happy-graphics.Rmd b/27_secrets-happy-graphics.Rmd index b1a261a..c45c59a 100644 --- a/27_secrets-happy-graphics.Rmd +++ b/27_secrets-happy-graphics.Rmd @@ -15,7 +15,7 @@ library(tidyverse) ## Hidden data wrangling problems -If you are struggling to make a figure, don't assume it's a problem between you and ggplot2. Stop and ask yourself which of these rules you are breaking: +If you are struggling to make a figure, don't assume it's a problem between you and [ggplot2]. Stop and ask yourself which of these rules you are breaking: * Keep stuff in data frames * Keep your data frames *tidy*; be willing to reshape your data often @@ -39,7 +39,7 @@ Problem is, ggplot2 has an incredibly strong preference for variables in data fr ggplot(mapping = aes(x = year, y = life_exp)) + geom_jitter() ``` -**Just leave the variables in place and pass the associated data frame!** This advice applies to base and lattice graphics as well. It is not specific to ggplot2. +**Just leave the variables in place and pass the associated data frame!** This advice applies to base and [lattice] graphics as well. It is not specific to ggplot2. ```{r data-in-situ} ggplot(data = gapminder, aes(x = year, y = life_exp)) + geom_jitter() @@ -82,6 +82,7 @@ cor(year, lifeExp, data = gapminder) ``` Sure, you can always just repeat the data frame name like so: + ```{r} cor(gapminder$year, gapminder$lifeExp) ``` @@ -95,7 +96,7 @@ with(gapminder, cor(year, lifeExp)) ``` -If you use the magrittr package, another option is to use the `%$%` operator to expose the variables inside a data frame for further computation: +If you use the [magrittr] package, another option is to use the `%$%` operator to expose the variables inside a data frame for further computation: ```{r message = FALSE, warning = FALSE} library(magrittr) @@ -107,7 +108,7 @@ gapminder %$% This is an entire topic covered elsewhere: -Chapter \@ref(tidy-data) - [Tidy data using Lord of the Rings][tidydata-lotr] +Chapter \@ref(tidy-data) - [Tidy data using Lord of the Rings] ## Factor management diff --git a/28_saving-figures.Rmd b/28_saving-figures.Rmd index 78de969..c6be076 100644 --- a/28_saving-figures.Rmd +++ b/28_saving-figures.Rmd @@ -20,13 +20,13 @@ _Do not_ succumb to the temptation of a mouse-based process. If might feel handy If you save figure-making code in a source file and you give figure files machine-readable, self-documenting names, your future self will be able to find its way back to this code. -Hypothetical: a [zombie project][zombie-project] comes back to life and your collaborator presents you with [a figure you made 18 months ago][tweet-project-resurfacing]. Can you remake `fig08_scatterplot-lifeExp-vs-year.pdf` as a TIFF and with smooth regression? Fun times! +Hypothetical: a [zombie project](https://imgur.com/ewmBeQG) comes back to life and your collaborator presents you with [a figure you made 18 months ago](https://twitter.com/JohnDCook/status/522377493417033728). Can you remake `fig08_scatterplot-lifeExp-vs-year.pdf` as a TIFF and with smooth regression? Fun times! This filename offers several properties to help you find the code that produced it: - * __Human-readability__: It's helpful to know you're searching for a scatterplot and maybe which variables are important. It gives important context for your personal archaeological dig. +* __Human-readability__: It's helpful to know you're searching for a scatterplot and maybe which variables are important. It gives important context for your personal archaeological dig. - * __Specificity__: Note how specific and descriptive the name of this figure file is; we didn't settle for the generic `fig08.pdf` or `scatterplot.pdf`. This makes the name at least somewhat unique, which will help you search your home directory for files containing part or all of this filename. +* __Specificity__: Note how specific and descriptive the name of this figure file is; we didn't settle for the generic `fig08.pdf` or `scatterplot.pdf`. This makes the name at least somewhat unique, which will help you search your home directory for files containing part or all of this filename. * __Machine-readability__: Every modern OS provides a way to search your hard drive for a file with a specific name or containing a specific string. This will be easier if the name contains no spaces, punctuation, or other funny stuff. If you use conventional extensions, you can even narrow the search to files ending in `.R` or `.Rmd`. @@ -36,13 +36,13 @@ All of these human practices will help you zero in on the R code you need, so yo Read the [R help for `Devices`][rdocs-devices] to learn about graphics devices in general and which are available on your system (*obviously requires that you read your local help*). -It is very important to understand the difference between [vector graphics][wiki-vector-graphics] and [raster graphics][wiki-raster-graphics]. Vector graphics are represented in terms of shapes and lines, whereas raster graphics are pixel-based. +It is very important to understand the difference between [vector graphics](https://en.wikipedia.org/wiki/Vector_graphics) and [raster graphics](https://en.wikipedia.org/wiki/Raster_graphics). Vector graphics are represented in terms of shapes and lines, whereas raster graphics are pixel-based. - * __Vector__ examples: PDF, postscript, SVG - - Pros: re-size gracefully, good for print. SVG is where the web is heading, though we are not necessarily quite there yet. - * __Raster__ examples: PNG, JPEG, BMP, GIF - - Cons: look awful "blown up" ... in fact, look awful quite frequently - - Pros: play very nicely with Microsoft Office products and the web. Files can be blessedly small! +* __Vector__ examples: PDF, postscript, SVG + - Pros: re-size gracefully, good for print. SVG is where the web is heading, though we are not necessarily quite there yet. +* __Raster__ examples: PNG, JPEG, BMP, GIF + - Cons: look awful "blown up" ... in fact, look awful quite frequently + - Pros: play very nicely with Microsoft Office products and the web. Files can be blessedly small! Tough love: you will not be able to pick vector or raster or a single device and use it all the time. You must think about your downstream use cases and plan accordingly. It is entirely possible that you should save key figures __in more than one format__ for maximum flexibility in the future. Worst case, if you obey the rules given here, you can always remake the figure to save in a new format. @@ -50,8 +50,8 @@ FWIW most of my figures exist as `pdf()`, `png()`, or both. Although it is not t Here are two good posts from the [Revolutions Analytics blog](https://blog.revolutionanalytics.com) with tips for saving figures to file: - * [10 tips for making your R graphics look their best][rgraphics-looks-tips] - * [High-quality R graphics on the Web with SVG][rgraphics-svg-tips] +* [10 tips for making your R graphics look their best](https://blog.revolutionanalytics.com/2009/01/10-tips-for-making-your-r-graphics-look-their-best.html) +* [High-quality R graphics on the Web with SVG](https://blog.revolutionanalytics.com/2011/07/r-svg-graphics.html) ## Write figures to file with `ggsave()` @@ -107,7 +107,7 @@ include_graphics(c("img/figures-io-p1.png", ``` -__Via the `base_size` of the active theme__: The `base_size` of the [theme][ggplot2-theme-args] refers to the base font size. This is NOT a theme element that can be modified via `ggplot(...) + theme(...)`. Rather, it's an argument to various functions that set theme elements. Therefore, to get the desired effect you need to create a complete theme, specifying the desired `base_size`. By setting `base size < 12`, the default value, you shrink text elements and by setting `base_size > 12`, you make them larger. Figure \@ref(fig:exaggerated-base-size) shows two versions of a figure, with exaggerated values of `base_size`, to illustrate its effect. +__Via the `base_size` of the active theme__: The `base_size` of the [theme](https://ggplot2.tidyverse.org/reference/ggtheme.html#arguments) refers to the base font size. This is NOT a theme element that can be modified via `ggplot(...) + theme(...)`. Rather, it's an argument to various functions that set theme elements. Therefore, to get the desired effect you need to create a complete theme, specifying the desired `base_size`. By setting `base size < 12`, the default value, you shrink text elements and by setting `base_size > 12`, you make them larger. Figure \@ref(fig:exaggerated-base-size) shows two versions of a figure, with exaggerated values of `base_size`, to illustrate its effect. ```r p3 <- p + ggtitle("base_size = 20") + theme_grey(base_size = 20) @@ -124,7 +124,7 @@ knitr::include_graphics(c("img/figures-io-p3.png", "img/figures-io-p4.png")) ``` -*Thanks to [Casey Shannon](https://twitter.com/cashoes_) for tips about `scale =` and this [cheatsheet from Zev Ross][zev-ross-cheatsheet] for tips about `base_size`.* +*Thanks to [Casey Shannon](https://twitter.com/cashoes_) for tips about `scale =` and this [cheatsheet from Zev Ross](http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/) for tips about `base_size`.* ```{r eval = FALSE, include = FALSE} ## here's how you to change base_size for an entire session (or at least until @@ -137,7 +137,7 @@ theme_set(otheme) ## Write non-ggplot2 figures to file -Recall that `ggsave()` is recommended if you're using ggplot2. But if you're using base graphics or lattice, here's generic advice for writing figures to file. To be clear, this *also* works for ggplot2 graphs, but I can't think of any good reasons to NOT use `ggsave()`. +Recall that `ggsave()` is recommended if you're using ggplot2. But if you're using base graphics or [lattice], here's generic advice for writing figures to file. To be clear, this *also* works for ggplot2 graphs, but I can't think of any good reasons to NOT use `ggsave()`. Edit your source code in the following way: precede the figure-making code by opening a graphics device and follow it with a command that closes the device. Here's an example: @@ -174,7 +174,7 @@ The appeal of this method is that you will literally copy the figure in front of Why is this method improper? Various aspects of a figure -- such as font size -- are determined by the target graphics device and its physical size. Therefore, it is best practice to open your desired graphics device explicitly, using any necessary arguments to control height, width, fonts, etc. Make your plot. And close the device. But for lots of everyday plots the `dev.print()` method works just fine. -If you call up the help file for [`dev.off()`, `dev.print()`, and friends][rdocs-dev], you can learn about many other functions for controlling graphics devices. +If you call up the help file for [`dev.off()`, `dev.print()`, and friends](https://rdrr.io/r/grDevices/dev.html), you can learn about many other functions for controlling graphics devices. ## Preemptive answers to some FAQs @@ -281,6 +281,9 @@ file_delete(dir_ls(path("img"), regexp = "fig-io-practice")) file_delete(dir_ls(".", regexp = "test-fig")) ``` + +[rdocs-ggsave]: https://rdrr.io/cran/ggplot2/man/ggsave.html +[rdocs-devices]: https://rdrr.io/r/grDevices/Devices.html ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/29_multiple-plots.Rmd b/29_multiple-plots.Rmd index dfa54b4..86b4a4b 100644 --- a/29_multiple-plots.Rmd +++ b/29_multiple-plots.Rmd @@ -14,7 +14,7 @@ Faceting is useful for constructing an array of similar plots where each panel c ## Meet the gridExtra package -Under the hood, ggplot2 uses the grid package to create figures. The gridExtra packages provides some extra goodies and we will draw on them to place multiple ggplot2 plots on a single virtual page. +Under the hood, ggplot2 uses the grid package to create figures. The [gridExtra] package provides some extra goodies and we will draw on them to place multiple ggplot2 plots on a single virtual page. You may need to install gridExtra and you will certainly need to load it. @@ -46,11 +46,11 @@ p_scatter <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + grid.arrange(p_dens, p_scatter, nrow = 2, heights = c(0.35, 0.65)) ``` -You can find other examples of this workflow in the [R Graph Catalog][r-graph-catalog-github]. +You can find other examples of this workflow in the [R Graph Catalog]. ## Use the `multiplot()` function -In the [Graphs](http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/) chapter of his [Cookbook for R][cookbook-for-r], Winston Chang uses the grid package to define the `multiplot()` function: +In the [Graphs][cookbook-for-r-multigraphs] chapter of his [Cookbook for R], Winston Chang uses the grid package to define the `multiplot()` function: ```{r chang-cookbook, eval = FALSE} # Multiple plot function @@ -110,12 +110,15 @@ Visit [Multiple graphs on one page (ggplot2)][cookbook-for-r-multigraphs] to see ## Use the cowplot package -The cowplot package ([CRAN][cowplot-cran]; [GitHub][cowplot-github]) does (at least) two things: +The cowplot package (on [CRAN](https://cloud.R-project.org/package=cowplot); on [GitHub](https://github.com/wilkelab/cowplot)) does (at least) two things: - * Provides a publication-ready theme for ggplot2. - * Helps combine multiple plots into one figure. +* Provides a publication-ready theme for ggplot2. +* Helps combine multiple plots into one figure. -Check out [the vignette][cowplot-vignette] to see it in action. +Check out [the vignette](https://cloud.r-project.org/web/packages/cowplot/vignettes/introduction.html) to see it in action. + + +[cookbook-for-r-multigraphs]: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/ ```{r links, child="links.md"} diff --git a/30_package-overview.Rmd b/30_package-overview.Rmd index 6eba1c7..3a85529 100644 --- a/30_package-overview.Rmd +++ b/30_package-overview.Rmd @@ -12,7 +12,7 @@ source("common.R") + What is an R package? + What is a library? + Why make an R package? - + How `devtools` creates a happy workflow. + + How [devtools] creates a happy workflow. * Chapter \@ref(system-prep) - Prepare your system for package development + Although we'll build a very simple package, we're still going to use the most modern and powerful tools for R package development. In theory, this could eventually involve compiling C/C++ code, which means you need what's called a *build environment*. See Chapter \@ref(system-prep) for help preparing your system. * Chapter \@ref(package-from-scratch) - Write your own R Package @@ -20,8 +20,8 @@ source("common.R") ## Resources {-} -* [R Packages][r-pkgs2] book: the second edition is under development by Hadley Wickham and Jennifer Bryan. -* [Writing R Extensions][cran-r-extensions], the One True Official Document on creating R packages. +* [R Packages] book: the second edition is under development by Hadley Wickham and Jennifer Bryan [-@wickham-unpub]. +* ["Writing R Extensions"], the One True Official Document on creating R packages. ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/32_system-prep-packages.Rmd b/32_system-prep-packages.Rmd index 2ae9715..088b9b6 100644 --- a/32_system-prep-packages.Rmd +++ b/32_system-prep-packages.Rmd @@ -14,9 +14,9 @@ Embarking on your career as an R package developer is an important milestone. Wh *2016-11 FYI: Jenny is running R version 3.3.1 (2016-06-21) Bug in Your Hair and RStudio 1.0.44 at the time of writing.* -## Install `devtools` from CRAN +## Install devtools from CRAN -We use the `devtools` package to help us develop our R package. Do this: +We use the [devtools] package to help us develop our R package. Do this: ``` r install.packages("devtools") @@ -31,11 +31,11 @@ You *can ignore* this and successfully develop an R package that consists solely However, we recommend you install Rtools, so you can take full advantage of devtools. Soon, you will want to use `devtools::install_github()` to install R packages from GitHub, instead of CRAN. You will inevitably need to build a package that includes C/C++ code, which *will require* Rtools. -Rtools is __NOT an R package__ but is rather ["a collection of resources for building packages for R under Microsoft Windows, or for building R itself"](https://cran.r-project.org/bin/windows/Rtools/). +Rtools is __NOT an R package__ but is rather ["a collection of resources for building packages for R under Microsoft Windows, or for building R itself"](https://cloud.r-project.org/bin/windows/Rtools/). Go here and do what it says: - + During the installation of Rtools you will get to a window asking you to "Select Additional Tasks". **It is important that you make sure to select the box for "Edit the system PATH"**. @@ -56,7 +56,7 @@ Hopefully you will simply see a message saying `TRUE`, indicating that Rtools is ## macOS: system prep -You will not get an *immediate* warning from `devtools` that you need to install anything. But before you can build R package with compiled code, you will also need to install more software. Pick one of the following: +You will not get an *immediate* warning from devtools that you need to install anything. But before you can build R package with compiled code, you will also need to install more software. Pick one of the following: * Minimalist approach (what I do): Install Xcode Command Line Tools. - In the shell: `xcode-select --install` @@ -65,11 +65,11 @@ You will not get an *immediate* warning from `devtools` that you need to install ## Linux: system prep -*We've never had this section but [RStudio's `devtools` guide][rstudio-devtools] and [R Packages](https://r-pkgs.org/intro.html#linux) both say the `r-devel` or `r-base-dev` package is required. What gives?* +*We've never had this section but [RStudio's devtools guide](https://www.rstudio.com/products/rpackages/devtools/) and [R Packages](https://r-pkgs.org/intro.html#linux) [@wickham-unpub] both say the r-devel or r-base-dev package is required. What gives?* ## Check system prep -`devtools` offers a diagnostic function to check if your system is ready. +devtools offers a diagnostic function to check if your system is ready. ``` r library(devtools) @@ -82,9 +82,9 @@ Hopefully you see `TRUE`! Install more packages. If you already have them, update them. -* knitr -* roxygen2 -* testthat +* [knitr] +* [roxygen2] +* [testthat] *2016-11 FYI: Jenny is running these versions of these packages at the time of writing.* @@ -128,9 +128,9 @@ update.packages(ask = FALSE) __CAVEAT:__ The above examples will only consult your default library and default CRAN mirror. If you want to target a non-default library, use function arguments to say so. Packages that you have installed from GitHub? You'll need to check the current-ness of your version and perform upgrades yourself. -## Optional: install `devtools` from GitHub +## Optional: install devtools from GitHub -We aren't using bleeding edge features of `devtools`, but you could upgrade to the development version of `devtools` at this point. +We aren't using bleeding edge features of devtools, but you could upgrade to the development version of devtools at this point. macOS and Linux users have it easy. Do this: @@ -138,7 +138,7 @@ macOS and Linux users have it easy. Do this: devtools::install_github("r-lib/devtools") ``` -For Windows instructions, see the [`devtools` README][devtools-github]. +For Windows instructions, see the [devtools README](https://github.com/r-lib/devtools). ```{r links, child="links.md"} diff --git a/33_create-package.Rmd b/33_create-package.Rmd index 8ff8962..8bf80ed 100644 --- a/33_create-package.Rmd +++ b/33_create-package.Rmd @@ -6,4 +6,4 @@ source("common.R") -*The content that originally lived here now appears as the [The Whole Game][r-pkgs2-whole-game] in the under-development 2nd edition of the [R Packages][r-pkgs2] book.* +*The content that originally lived here now appears as the [The Whole Game](https://r-pkgs.org/whole-game.html) chapter in the under-development 2nd edition of the [R Packages] book [@wickham-unpub].* diff --git a/34_workflows.Rmd b/34_workflows.Rmd index 6125828..ba4c9b8 100644 --- a/34_workflows.Rmd +++ b/34_workflows.Rmd @@ -16,7 +16,7 @@ Although we spend a lot of time working with data interactively, this sort of ha + *2015-11-17 NOTE: since we have already set up a build environment for R packages, it is my hope that everyone has `make`. These instructions were from 2014, when we did everything in a different order. Cross your fingers and ignore!* + (If you are running macOS or Linux, `make` should already be installed.) * Chapter \@ref(make-test-drive) - Test drive `make` and RStudio - + Walk before you run! Prove that `make` is actually installed and that it can be found and executed from the [shell][hg-shell] and from RStudio. It is also important to tell RStudio to NOT substitute spaces for tabs when editing a `Makefile` (applies to any text editor). + + Walk before you run! Prove that `make` is actually installed and that it can be found and executed from the [shell] and from RStudio. It is also important to tell RStudio to NOT substitute spaces for tabs when editing a `Makefile` (applies to any text editor). * Chapter \@ref(automating-pipeline) - Hands-on activity + This fully developed example shows you: * How to run an R script non-interactively @@ -29,9 +29,9 @@ Although we spend a lot of time working with data interactively, this sort of ha - Run an entire R script - Render an R Markdown document (or R script) * The interface between RStudio and `make` - * How to use `make` from the [shell][hg-shell] + * How to use `make` from the [shell] * How Git facilitates the process of building a pipeline - + *2015-11-19 Andrew MacDonald translated the above into a pipeline for the [`remake` package](https://github.com/richfitz/remake) from Rich Fitzjohn: see [this gist](https://gist.github.com/aammd/72a5b98356893c001001).* + + *2015-11-19 Andrew MacDonald translated the above into a pipeline for the [remake] package from Rich Fitzjohn: see [this gist](https://gist.github.com/aammd/72a5b98356893c001001).* * Chapter \@ref(example-pipelines) - Three more toy pipelines, using the Lord of the Rings data ## Resources {-} @@ -39,20 +39,20 @@ Although we spend a lot of time working with data interactively, this sort of ha * [xkcd comic on automation](https://xkcd.com/1319/). 'Automating' comes from the roots 'auto-' meaning 'self-', and 'mating', meaning 'screwing'. -* Karl Broman covers [GNU Make](https://www.gnu.org/software/make/) in his course [Tools for Reproducible Research](https://kbroman.org/Tools4RR/pages/schedule.html). -* Karl Broman also wrote [minimal make: a minimal tutorial on make](https://kbroman.org/minimal_make/), aimed at stats / data science types. -* [Using Make for reproducible scientific analyses](https://web.archive.org/web/20160306042959/http://www.bendmorris.com/2013/09/using-make-for-reproducible-scientific.html), blog post by Ben Morris. +* Karl Broman covers [GNU Make](https://www.gnu.org/software/make/) in his course ["Tools for Reproducible Research"](https://kbroman.org/Tools4RR/pages/schedule.html). +* Karl Broman also wrote ["minimal make: a minimal tutorial on make"](https://kbroman.org/minimal_make/), aimed at stats / data science types. +* ["Using Make for reproducible scientific analyses"](https://web.archive.org/web/20160306042959/http://www.bendmorris.com/2013/09/using-make-for-reproducible-scientific.html), blog post by Ben Morris. * Software Carpentry's [Slides on `Make`](https://web.archive.org/web/20150110211213/http://software-carpentry.org/v4/make/index.html). -* Zachary M. Jones wrote [GNU Make for Reproducible Data Analysis](http://zmjones.com/make/). -* [Keeping tabs on your data analysis workflow](https://adamlaiacano.tumblr.com/post/45356689519/keeping-tabs-on-your-data-analysis-workflow), blog post by Adam Laiacano. -* Mike Bostock, of D3.js and New York Times fame, explains [Why Use Make](https://bost.ocks.org/mike/make/): "it's about the benefits of capturing workflows via a file-based dependency-tracking build system". -* [Make for Data Scientists](https://paulbutler.org/2012/make-for-data-scientists/), blog post by Paul Butler, who also made a [beautiful map of Facebook connections](https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919) using R. +* Zachary M. Jones wrote ["GNU Make for Reproducible Data Analysis"](http://zmjones.com/make/). +* ["Keeping tabs on your data analysis workflow"](https://adamlaiacano.tumblr.com/post/45356689519/keeping-tabs-on-your-data-analysis-workflow), blog post by Adam Laiacano. +* Mike Bostock, of D3.js and New York Times fame, explains ["Why Use Make"](https://bost.ocks.org/mike/make/) -- "it's about the benefits of capturing workflows via a file-based dependency-tracking build system". +* ["Make for Data Scientists"](https://paulbutler.org/2012/make-for-data-scientists/), blog post by Paul Butler, who also made a [beautiful map of Facebook connections](https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919) using R. * Other, more modern data-oriented alternatives to `make`: + [Drake](https://github.com/Factual/drake), a kind of "make for data" + [Nextflow](https://www.nextflow.io) for "data-driven computational pipelines" - + [`remake`](https://github.com/richfitz/remake), "Make-like declarative workflows in R" -* [Managing Projects with GNU Make, 3rd Edition](http://shop.oreilly.com/product/9780596006105.do) by Robert Mecklenburg [-@mecklenburg2009] is a fantastic book but, sadly, is very focused on compiling software. -* [`littler`](http://dirk.eddelbuettel.com/code/littler.html) is an R package maintained by Dirk Eddelbuettel that "provides the `r` program, a simplified command-line interface for GNU R." + + [remake], "Make-like declarative workflows in R" +* [Managing Projects with GNU Make, 3rd Edition] by Robert Mecklenburg [-@mecklenburg2009] is a fantastic book but, sadly, is very focused on compiling software. +* [littler] is an R package maintained by Dirk Eddelbuettel that "provides the `r` program, a simplified command-line interface for GNU R." # Why and how we automate data analyses + examples {#automation-slides} @@ -67,13 +67,13 @@ See ["Automating data analysis pipelines" slides](https://github.com/STAT545-UBC -*2015-11-17 NOTE: This year we made R packages before we used `make` The hope is, therefore, that the `make` that ships with Rtools is all we need. So hopefully we can ignore this?* +*2015-11-17 NOTE: This year we made R packages before we used `make`. The hope is, therefore, that the `make` that ships with Rtools is all we need. So hopefully we can ignore this?* ## Install `make` on Microsoft Windows We are still working out the best way to install `make` on Windows. Our current best recommendation is to install *msysGit*, which includes `make` as well as `git` and `bash`. -Download and [install msysGit](https://github.com/msysgit/msysgit/releases/download/Git-1.9.4-preview20140929/msysGit-netinstall-1.9.4-preview20140929.exe). The two software packages [msysGit](https://github.com/msysgit/msysgit) and [Git for Windows](http://msysgit.github.io/) are related. Both install `git` and `bash`, but only *msysGit* installs `make`. The programs installed by *msysGit* are found by default in `C:\msysGit\bin`. Here is the [complete list](https://github.com/msysgit/msysgit/tree/master/bin) of programs included with *msysGit*. For this activity, RStudio needs to be able to find in your `PATH` environment variable the program `make`, the [shell][hg-shell] `bash`, other utilities like `rm` and `cp`, and `Rscript`. +Download and [install msysGit](https://github.com/msysgit/msysgit/releases/download/Git-1.9.4-preview20140929/msysGit-netinstall-1.9.4-preview20140929.exe). The two software packages [msysGit](https://github.com/msysgit/msysgit) and [Git for Windows](http://msysgit.github.io/) are related. Both install `git` and `bash`, but only *msysGit* installs `make`. The programs installed by *msysGit* are found by default in `C:\msysGit\bin`. Here is the [complete list](https://github.com/msysgit/msysgit/tree/master/bin) of programs included with *msysGit*. For this activity, RStudio needs to be able to find in your `PATH` environment variable the program `make`, the [shell] `bash`, other utilities like `rm` and `cp`, and `Rscript`. Here is another alternative for installing `make` alone: @@ -108,9 +108,9 @@ See [issue 58](https://github.com/STAT545-UBC/Discussion/issues/58) for what see What are the tricky bits? -* Getting the same `Makefile` to "work" via RStudio's Build buttons/menus and in the [shell][hg-shell]. And, for that matter, which [shell][hg-shell]? Git Bash or ??? +* Getting the same `Makefile` to "work" via RStudio's Build buttons/menus and in the [shell]. And, for that matter, which [shell]? Git Bash or ??? * Ensuring `make`, `Rscript`, `pandoc`, `rm`, etc. can be found = updating `PATH`. -* Getting `make` to use the correct [shell][hg-shell]. +* Getting `make` to use the correct [shell]. - See [issue 54](https://github.com/STAT545-UBC/Discussion/issues/54) on the Discussion repo. @@ -119,7 +119,7 @@ What are the tricky bits? -Before we use `make` for real work, we want to prove beyond a shadow of a doubt that it's installed and findable from RStudio and/or the [shell][hg-shell]. +Before we use `make` for real work, we want to prove beyond a shadow of a doubt that it's installed and findable from RStudio and/or the [shell]. ## Create a temporary RStudio project @@ -204,7 +204,7 @@ This proves that `make` is installed and working from RStudio. ## Run `make` from the shell -RStudio only provides access to a very limited bit of `make` -- it's even more limited than the RStudio Git client. In the long run, it's important to be able to run `make` from the [shell][hg-shell]. +RStudio only provides access to a very limited bit of `make` -- it's even more limited than the RStudio Git client. In the long run, it's important to be able to run `make` from the [shell]. * Select *Tools > Shell* * Run @@ -293,14 +293,14 @@ __Suggested workflow:__ * Submit the above `download.file()` command in the R Console to make sure it works. * Inspect the downloaded words file any way you know how; make sure it's not garbage. Size should be about 2.4MB. * Delete `words.txt`. -* Put the above rule into your `Makefile`. From the [shell][hg-shell], enter `make words.txt` to verify rule works. Reinspect the words file. +* Put the above rule into your `Makefile`. From the [shell], enter `make words.txt` to verify rule works. Reinspect the words file. * *Git folks:* commit `Makefile` and `words.txt`. See the sample project at this point in [this commit](https://github.com/STAT545-UBC/make-activity/tree/c30ecc9c890a2f2261eb94118997f0774012eeb8). ### Copy the dictionary -On Mac or Linux systems, rather than download the dictionary, we can simply copy the file `/usr/share/dict/words` that comes with the operating system. In this alternative rule, we use the [shell][hg-shell] command `cp` to copy the file. +On Mac or Linux systems, rather than download the dictionary, we can simply copy the file `/usr/share/dict/words` that comes with the operating system. In this alternative rule, we use the [shell] command `cp` to copy the file. ```makefile words.txt: /usr/share/dict/words @@ -318,10 +318,10 @@ __Suggested workflow:__ * *Git folks:* commit anything new/modified. Start with a clean working tree. * Remove `words.txt` if you succeeded with the download approach. -* Submit the above `cp` command in the [shell][hg-shell] to make sure it works. +* Submit the above `cp` command in the [shell] to make sure it works. * Inspect the copied words file any way you know how; make sure it's not garbage. Size should be about 2.4MB. * Delete `words.txt`. -* Put the above rule into your `Makefile`. From the [shell][hg-shell], enter `make words.txt` to verify rule works. Reinspect the words file. +* Put the above rule into your `Makefile`. From the [shell], enter `make words.txt` to verify rule works. Reinspect the words file. * *Git folks:* look at the diff. You should see how your `words.txt` rule has changed and you might also see some differences between the local and remote words files. Interesting! Commit `Makefile` and `words.txt`. See the sample project at this point in [this commit](https://github.com/STAT545-UBC/make-activity/tree/1131791548e0c5bbc5104eebb19710ed435146e3). @@ -365,7 +365,7 @@ histogram.tsv: histogram.r words.txt Rscript $< ``` -FYI: `Rscript` allows you to execute R scripts from the [shell][hg-shell]. It is a more modern replacement for `R CMD BATCH` (don't worry if you've never heard of that). +FYI: `Rscript` allows you to execute R scripts from the [shell]. It is a more modern replacement for `R CMD BATCH` (don't worry if you've never heard of that). Create the R script `histogram.r` that reads the list of words from `words.txt` and writes the table of word length frequency to `histogram.tsv`. It should be a tab-delimited TSV file with a header and two columns, named `Length` and `Freq`. Hint: you can accomplish this task using four functions: `readLines`, `nchar`, `table` and `write.table`. Here's [one solution](https://raw.githubusercontent.com/STAT545-UBC/STAT545-UBC.github.io/master/automation10_holding-area/activity/histogram.r), but try not to peek until you've attempted this task yourself. @@ -374,7 +374,7 @@ __Suggested workflow:__ * Develop your `histogram.r` script interactively. Make sure it works when you step through it line-by-line. Debugging only gets harder once you're running entire scripts at arm's length via `make`! * Remove `histogram.tsv`. Clean out the workspace and restart R. Run `histogram.r` via `source()` or using RStudio's Source button. Make sure it works! * Add the `histogram.tsv` rule to your `Makefile`. -* Remove `histogram.tsv` and regenerate it via `make histogram.tsv` from the [shell][hg-shell]. +* Remove `histogram.tsv` and regenerate it via `make histogram.tsv` from the [shell]. * *Git folks:* Commit. See the sample project at this point in [this commit](https://github.com/STAT545-UBC/make-activity/tree/889e01a3d610e900c7e58ebd32a0506c61543fd9). @@ -545,6 +545,11 @@ There are three more toy pipelines, using the Lord of the Rings data, that reinf * [`02_automation-example_r-and-make`](https://github.com/STAT545-UBC/STAT545-UBC.github.io/tree/master/automation10_holding-area/02_automation-example_r-and-make) - use of a simple `Makefile`. * [`03_automation-example_render-without-rstudio`](https://github.com/STAT545-UBC/STAT545-UBC.github.io/tree/master/automation10_holding-area/03_automation-example_render-without-rstudio) - use of `rmarkdown::render()` from a `Makefile`, as the default way of running an R script or an R Markdown document, leading to pretty HTML reports without any mouse clicks. + +[remake]: https://github.com/richfitz/remake +[littler]: http://dirk.eddelbuettel.com/code/littler.html +[shell]: https://happygitwithr.com/shell.html + ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/36_api-wrappers.Rmd b/36_api-wrappers.Rmd index d0e6174..268ff28 100644 --- a/36_api-wrappers.Rmd +++ b/36_api-wrappers.Rmd @@ -22,10 +22,10 @@ There are many ways to obtain data from the internet; let's consider four catego In the simplest case, the data you need is already on the internet in a tabular format. There are a couple of strategies here: -* Use `read.csv` or `readr::read_csv` to read the data straight into R. -* Use the command line program `curl` to do that work, and place it in a `Makefile` or shell script (see the [section on `make`](#automation-overview) for more on this). +* Use `read.csv()` or `readr::read_csv()` to read the data straight into R. +* Use the command line program curl to do that work, and place it in a `Makefile` or shell script (see the [section on make](#automation-overview) for more on this). -The second case is most useful when the data you want has been provided in a format that needs cleanup. For example, the World Value Survey makes several datasets available as Excel sheets. The safest option here is to download the `.xls` file, then read it into R with `readxl::read_excel()` or something similar. An exception to this is data provided as Google Spreadsheets, which can be read straight into R using the [`googlesheets`](https://github.com/jennybc/googlesheets) package. +The second case is most useful when the data you want has been provided in a format that needs cleanup. For example, the World Value Survey makes several datasets available as Excel sheets. The safest option here is to download the `.xls` file, then read it into R with `readxl::read_excel()` or something similar. An exception to this is data provided as Google Spreadsheets, which can be read straight into R using the [googlesheets] package. ### From rOpenSci web services page: @@ -33,7 +33,7 @@ From rOpenSci's [CRAN Task View: Web Technologies and Services](https://github.c * `downloader::download()` for SSL. * `curl::curl()` for SSL. -* `httr::GET` data read this way needs to be parsed later with `read.table()`. +* `httr::GET()` data read this way needs to be parsed later with `read.table()`. * `rio::import()` can "read a number of common data formats directly from an `https://` URL". Isn't that very similar to the previous? @@ -65,7 +65,7 @@ library(tidyverse) ### Sightings of birds: rebird -[rebird](https://github.com/ropensci/rebird) is an R interface for the [eBird](http://ebird.org/content/ebird/) database. eBird lets birders upload sightings of birds, and allows everyone access to those data. rebird is on CRAN. +[rebird] (on [CRAN](https://cloud.r-project.org/web/packages/rebird/index.html); on [GitHub](https://github.com/ropensci/rebird)) is an R interface for the [eBird](http://ebird.org/content/ebird/) database. eBird lets birders upload sightings of birds, and allows everyone access to those data. ```{r message = FALSE} # install.packages("rebird") @@ -83,7 +83,6 @@ At that link, you will see a page like this: knitr::include_graphics("img/Iona_island.png") ``` - The data already looks to be organized in a data frame! rebird allows us to read these data directly into R (the ID code for Iona Island is **"L261851"**). @@ -129,27 +128,28 @@ ebirdgeo(species = 'Buteo lagopus') ### Searching geographic info: geonames {#geonames} -[rOpenSci](https://ropensci.org) has a package called [geonames](https://docs.ropensci.org/geonames/) for accessing the [GeoNames API](https://www.geonames.org). First, install the geonames package from CRAN and load it. +[rOpenSci] has a package called [geonames] (on [CRAN](https://cloud.r-project.org/web/packages/geonames/index.html); on [GitHub](https://github.com/ropensci/geonames)) for accessing the [GeoNames API](https://www.geonames.org). First, install the geonames package from CRAN and load it. ```{r message = FALSE, warning = FALSE} # install.packages("geonames") library(geonames) ``` -The [geonames package website](https://docs.ropensci.org/geonames/) tells us that there are a few things we need to do before we can use geonames to access the GeoNames API: +The [geonames package website][geonames] tells us that there are a few things we need to do before we can use geonames to access the GeoNames API: 1. Go to [the GeoNames site](https://www.geonames.org/login) and create a new user account. 1. Check your email and follow the instructions to activate your account. -1. Click [here] to enable the free web services for your account (Note! You must be logged into your GeoNames account already for the link to work). +1. Click [here](https://www.geonames.org/enablefreewebservice) to enable the free web services for your account (Note! You must be logged into your GeoNames account already for the link to work). 1. Tell R your GeoNames username. To do the last step, we could run this line in R... + ```r options(geonamesUsername="my_user_name") ``` ...but this is insecure. We don't want to risk committing this line and pushing it to our public GitHub page! -Instead, we can add this line to our `.Rprofile` so it will be hidden. One way to edit your `.Rprofile` is with the helper function `edit_r_profile()` from the [usethis][usethis-web] package. Install/load the usethis package and run `edit_r_profile()` in the R Console: +Instead, we can add this line to our `.Rprofile` so it will be hidden. One way to edit your `.Rprofile` is with the helper function `edit_r_profile()` from the [usethis] package. Install/load the usethis package and run `edit_r_profile()` in the R Console: ```{r message = FALSE, warning = FALSE, eval = FALSE} # install.packages("usethis") @@ -163,7 +163,7 @@ This will open up your `.Rprofile` file. Add `options(geonamesUsername="my_user_ Save the file, close it, and restart R. Now we're ready to start using geonames to search the GeoNames API. -(Also see the [Cache credentials for HTTPS](https://happygitwithr.com/credential-caching.html) chapter of [Happy Git and GitHub for the useR](https://happygitwithr.com).) +(Also see the [Cache credentials for HTTPS](https://happygitwithr.com/credential-caching.html) chapter of [Happy Git and GitHub for the useR].) #### Using GeoNames @@ -177,7 +177,6 @@ countryInfo <- GNcountryInfo() glimpse(countryInfo) ``` - This `countryInfo` dataset is very helpful for accessing the rest of the data because it gives us the standardized codes for country and language. #### Remixing geonames and rebird: @@ -220,7 +219,7 @@ nrow(rio_portuguese) ### Searching the Public Library of Science: rplos {#plos-one} -[PLOS ONE](https://journals.plos.org/plosone/) is an open-access journal. They allow access to an impressive range of search tools, and allow you to obtain the full text of their articles. rOpenSci has a package called rplos that we can use to interact with the [PLOS API](http://api.plos.org). They have a nice tutorial on the rOpenSci website that you can see [here](https://ropensci.org/tutorials/rplos_tutorial.html). First, install/load the rplos package from CRAN. +[PLOS ONE](https://journals.plos.org/plosone/) is an open-access journal. They allow access to an impressive range of search tools, and allow you to obtain the full text of their articles. [rOpenSci] has a package called [rplos] (on [CRAN](https://cloud.r-project.org/package=rplos); on [GitHub](https://github.com/ropensci/rplos)) that we can use to interact with the [PLOS API](http://api.plos.org). First, install/load the rplos package from CRAN. ```{r message = FALSE, warning = FALSE} # install.packages("rplos") @@ -229,7 +228,7 @@ library(rplos) #### Searching PLOS ONE -Let's follow along with the [`rOpenSci` tutorial](https://ropensci.org/tutorials/rplos_tutorial.html) and do some searches: +Let's follow along with the [rOpenSci tutorial](https://ropensci.org/tutorials/rplos_tutorial.html) and do some searches: ```{r} searchplos(q= "Helianthus", fl= "id", limit = 5) @@ -268,17 +267,15 @@ knitr::include_graphics("img/rplos-highbrow.png") We can use the `plot_throughtime()` function to visualize the results of a search over time. - ```{r} plot_throughtime(terms = "phylogeny", limit = 200) ``` - ### Is it a boy or a girl? gender-associated names throughout US history -The gender package allows you access to data on the gender of names in the US. Because names change gender over the years, the probability of a name belonging to a man or a woman also depends on the *year*. +The [gender] package (on [CRAN](https://cloud.r-project.org/package=gender); on [GitHub](https://github.com/ropensci/gender)) allows you access to data on the gender of names in the US. Because names change gender over the years, the probability of a name belonging to a man or a woman also depends on the *year*. -First, install/load the gender package from CRAN. You may be prompted to also install the companion package, genderdata. Go ahead and say yes. If you don't see this message no need to worry, it is a one-time install. +First, install/load the gender package from CRAN. You may be prompted to also install the companion package, [genderdata]. Go ahead and say yes. If you don't see this message no need to worry, it is a one-time install. ```{r} # install.packages("gender") diff --git a/37_diy-web-data.Rmd b/37_diy-web-data.Rmd index bd9984a..d0ca639 100644 --- a/37_diy-web-data.Rmd +++ b/37_diy-web-data.Rmd @@ -19,7 +19,7 @@ In Chapter \@ref(api-wrappers) we experimented with several packages that "wrapp ### Load the tidyverse -We will be using the functions from the [tidyverse][tidyverse-main-page] throughout this chapter, so go ahead and load tidyverse package now. +We will be using the functions from the [tidyverse] throughout this chapter, so go ahead and load tidyverse package now. ```{r message = FALSE, warning = FALSE} library(tidyverse) @@ -80,7 +80,7 @@ Try pasting these URLs into your browser. You should see this if you tried the f ### Create an OMDb API Key -This tells us that we need an API key to access the OMDb API. We will store our key for the OMDb API in our `.Renviron` file using the helper function `edit_r_environ()` from the [usethis][usethis-web] package. Follow these steps: +This tells us that we need an API key to access the OMDb API. We will store our key for the OMDb API in our `.Renviron` file using the helper function `edit_r_environ()` from the [usethis] package. Follow these steps: 1. Visit this URL and request your free API key: 1. Check your email and follow the instructions to activate your key. @@ -164,14 +164,14 @@ Remember that using `.Rprofile` makes your code un-reproducible. In this case, ### Recreate the request URL in R -How can we recreate the same request URLs in R? We could use the [`glue` package](https://glue.tidyverse.org/) to paste together the base URL, parameter labels, and parameter values: +How can we recreate the same request URLs in R? We could use the [glue] package to paste together the base URL, parameter labels, and parameter values: ```{r} request <- glue::glue("http://www.omdbapi.com/?t=Interstellar&y=2014&plot=short&r=xml&apikey={movie_key}") request ``` -This works, but it only works for movie titled `Interstellar` from 2014 where we want the short plot and the XML format. Let's try to pull out more variables and paste them in with `glue`: +This works, but it only works for movie titled `Interstellar` from 2014 where we want the short plot and the XML format. Let's try to pull out more variables and paste them in with `glue()`: ```{r} glue::glue("http://www.omdbapi.com/?t={title}&y={year}&plot={plot}&r={format}&apikey={api_key}", @@ -192,8 +192,7 @@ omdb <- function(title, year, plot, format, api_key) { ### Get data using the curl package - -Now we have a handy function that returns the API query. We can paste in the link, but we can also obtain data from within R using the [curl][curl-cran] package. Install/load the curl package first. +Now we have a handy function that returns the API query. We can paste in the link, but we can also obtain data from within R using the [curl] package. Install/load the curl package first. ```{r message = FALSE, warning = FALSE} # install.packages("curl") @@ -201,6 +200,7 @@ library(curl) ``` Using curl to get the data in XML format: + ```{r fake-xml-req, eval = FALSE} request_xml <- omdb(title = "Interstellar", year = "2014", plot = "short", format = "xml", api_key = movie_key) @@ -221,8 +221,8 @@ close(con) answer_xml ``` - Using curl to get the data in JSON format: + ```{r fake-json-req, eval = FALSE} request_json <- omdb(title = "Interstellar", year = "2014", plot = "short", format = "json", api_key = movie_key) @@ -286,14 +286,15 @@ You can see that both of these data structures are quite easy to read. They are ### Parsing the JSON response with jsonlite -Our JSON response above can be parsed using `jsonlite::fromJSON()`. First install/load the jsonlite package. +Our JSON response above can be parsed with the [jsonlite] package. First install/load the jsonlite package. ```{r message = FALSE, warning = FALSE} # install.packages("jsonlite") library(jsonlite) ``` -Parsing our JSON response with `fromJSON()`: +Parsing our JSON response with `jsonlite::fromJSON()`: + ```{r} answer_json %>% fromJSON() @@ -310,14 +311,14 @@ answer_json %>% ### Parsing the XML response using xml2 -We can use the [xml2][xml2-web] package to wrangle our XML response. +We can use the [xml2] package to wrangle our XML response. ```{r message = FALSE, warning = FALSE} # install.packages("xml2") library(xml2) ``` -Parsing our XML response with `read_xml()`: +Parsing our XML response with `xml2::read_xml()`: ```{r} (xml_parsed <- read_xml(answer_xml)) @@ -350,7 +351,7 @@ attrs %>% ## Introducing the easy way: httr -[httr][httr-web] is yet another star in the [tidyverse][tidyverse-main-page]. It is a package designed to facilitate all things HTTP from within R. This includes the major HTTP verbs, which are: +The [httr] package is yet another star in the [tidyverse]. It is designed to facilitate all things HTTP from within R. This includes the major HTTP verbs, which are: * __`GET()`__ - Fetch an existing resource. The URL contains all the necessary information the server needs to locate and return the resource. @@ -364,7 +365,7 @@ attrs %>% --> -HTTP is the foundation for APIs; understanding how it works is the key to interacting with all the diverse APIs out there. An excellent beginning resource for APIs (including HTTP basics) is [An Introduction to APIs](https://zapier.com/learn/apis/) by Brian Cooksey. +HTTP is the foundation for APIs; understanding how it works is the key to interacting with all the diverse APIs out there. An excellent beginning resource for APIs (including HTTP basics) is ["An Introduction to APIs"](https://zapier.com/learn/apis/) by Brian Cooksey. httr also facilitates a variety of ___authentication___ protocols. @@ -376,6 +377,7 @@ library(httr) ``` Using httr to get the data in JSON format: + ```{r fake-httr-json, eval = FALSE} request_json <- omdb(title = "Interstellar", year = "2014", plot = "short", format = "json", api_key = movie_key) @@ -391,6 +393,7 @@ content(response_json, as = "parsed", type = "application/json") ``` Using httr to get the data in XML format: + ```{r fake-httr-xml, eval = FALSE} request_xml <- omdb(title = "Interstellar", year = "2014", plot = "short", format = "xml", api_key = movie_key) @@ -454,7 +457,7 @@ knitr::include_graphics("https://imgs.xkcd.com/comics/tags.png") Two pieces of equipment: -1. The [rvest][rvest-web] package ([CRAN][rvest-cran]; [GitHub][rvest-github]). Install via `install.packages("rvest)"`. +1. The [rvest] package. Install via `install.packages("rvest)"`. 1. SelectorGadget: point and click CSS selectors. [Install in your browser](http://selectorgadget.com/). Before we go any further, [let's play a game together](http://flukeout.github.io)! diff --git a/38_shiny.Rmd b/38_shiny.Rmd index 2f2e250..7c60f38 100644 --- a/38_shiny.Rmd +++ b/38_shiny.Rmd @@ -11,27 +11,31 @@ Many people have written packages that enhance Shiny in some way or add extra functionality. Here is a list of several popular packages that people often use together with Shiny: -* [shinythemes][shinythemes-web] - Easily alter the appearance of your app ([CRAN][shinythemes-cran]). -* [shinyjs][shinyjs-web] - Enhance user experience in Shiny apps using JavaScript functions without knowing JavaScript ([CRAN][shinyjs-cran]; [GitHub][shinyjs-github]). -* [leaflet][leaflet-web] - Add interactive maps to your apps ([CRAN][leaflet-cran]; [GitHub][leaflet-github]). -* [ggvis][ggvis-web] - Similar to ggplot2, but the plots are focused on being web-based and are more interactive ([CRAN][ggvis-cran]). -* [shinydashboard][shinydashboard-web] - Gives you tools to create visual “dashboards” ([CRAN][shinydashboard-cran]; [GitHub][shinydashboard-github]). +* [shinythemes] - Easily alter the appearance of your app (on [CRAN](https://cloud.r-project.org/package=shinythemes); on [GitHub](https://github.com/rstudio/shinythemes)). + +* [shinyjs] - Enhance user experience in Shiny apps using JavaScript functions without knowing JavaScript (on [CRAN](https://cloud.r-project.org/package=shinyjs); on [GitHub](https://github.com/daattali/shinyjs)). + +* [leaflet][leaflet-web] - Add interactive maps to your apps (on [CRAN](https://cloud.r-project.org/package=leaflet); on [GitHub](https://github.com/rstudio/leaflet)). + +* [ggvis] - Similar to [ggplot2], but the plots are focused on being web-based and are more interactive (on [CRAN](https://cloud.r-project.org/package=ggvis); on [GitHub](https://github.com/rstudio/ggvis)). *Currently dormant* + +* [shinydashboard] - Gives you tools to create visual “dashboards” (on [CRAN](https://cloud.r-project.org/package=shinydashboard); on [GitHub](https://github.com/rstudio/shinydashboard)). ## Resources {-} Shiny is a very popular package and has lots of resources on the web. Here's a compiled list of a few resources I recommend, which are all fairly easy to read and understand. -- [Shiny official website][shiny-official-web] -- [Shiny official tutorial][shiny-official-tutorial] -- [Shiny cheatsheet][shiny-cheatsheet] -- [Lots of short useful articles about different topics in Shiny - **highly recommended**][shiny-articles] +- Official [Shiny] website +- Official [Shiny tutorial] +- RStudio [Shiny Cheat Sheet] +- [Lots of short useful articles about different topics in Shiny - **highly recommended**](https://shiny.rstudio.com/articles/) - [Shiny in R Markdown](http://rmarkdown.rstudio.com/authoring_shiny.html) -- Get help from the [Shiny Google group][shiny-google-groups] or [StackOverflow][shiny-stack-overflow] -- [Publish your apps for free with shinyapps.io][shinyapps-web] -- [Host your app on your own Shiny server][shiny-server-setup] -- [Learn about how reactivity works][shiny-reactivity] -- [Learn about useful debugging techniques][shiny-debugging] +- Get help from the [Shiny Google group](https://groups.google.com/forum/#!forum/shiny-discuss) or [StackOverflow](https://stackoverflow.com/questions/tagged/shiny) +- Publish your apps for free with [shinyapps.io] +- Host your app on your own Shiny server: ["How to get your very own RStudio Server and Shiny Server"] +- Learn about how reactivity works: ["How to understand reactivity in R"] +- Learn about useful debugging techniques: ["Debugging Shiny applications"] # Slides {#shiny-slides} @@ -48,13 +52,13 @@ source("common.R") -[Shiny][shiny-official-web] is a package from RStudio that can be used to build interactive web pages with R. While that may sound scary because of the words "web pages", Shiny is geared to R users who have zero experience with web development, and you do not need to know any HTML/CSS/JavaScript. +[Shiny] is a package from RStudio that can be used to build interactive web pages with R. While that may sound scary because of the words "web pages", Shiny is geared to R users who have zero experience with web development, and you do not need to know any HTML/CSS/JavaScript. You can do quite a lot with Shiny: think of it as an easy way to make an interactive web page, and that web page can seamlessly interact with R and display R objects (plots, tables, of anything else you do in R). To get a sense of the wide range of things you can do with Shiny, you can visit my Shiny server (), which hosts some of my own Shiny apps. This tutorial is a hands-on activity complement to a set of [presentation slides](#shiny-slides) for learning how to build Shiny apps. In this activity, we'll walk through all the steps of building a Shiny app using a dataset that lets you explore the products available at the BC Liquor Store. The final version of the app, including a few extra features that are left as exercises for the reader, can be seen here: . Any activity deemed as an exercise throughout this tutorial is not mandatory for building our app, but they are good for getting more practice with Shiny. -This tutorial should take approximately an hour to complete. If you want even more practice, another great tutorial is the [official Shiny tutorial][shiny-official-tutorial]. RStudio also provides a [handy cheatsheet][shiny-cheatsheet] to remember all the little details after you already learned the basics. +This tutorial should take approximately an hour to complete. If you want even more practice, another great tutorial is the official [Shiny tutorial]. RStudio also provides a handy [Shiny Cheat Sheet] to remember all the little details after you already learned the basics. ## Before we begin {#shiny-tutorial-1} @@ -671,7 +675,7 @@ Now your app will run. If you want to access a reactive variable defined with `r You can think of reactivity as causing a chain reaction: when one reactive value changes, anything that depends on it will get updated. If any of the updated values are themselves reactive variables, then any reactive contexts that depend on those variables will also get updated in turn. As a concrete example, let's think about what happens when you change the value of the `priceInput` on the page. Since `input$priceInput` is a reactive variable, any expression that uses it will get updated. This means the two render functions from earlier will execute because they both depend on `input$priceInput`, as well as the `priceDiff` variable because it also depends on it. But since `priceDiff` is itself a reactive variable, Shiny will check if there is anything that depends on `priceDiff`, and indeed there is - the `observe({})` function that prints the value of `priceDiff`. So once `priceDiff` gets updated, the `observe({})` function will run, and the value will get printed. -Reactivity is usually the hardest part about Shiny to understand, so if you don't quite get it, don't feel bad. Try reading this section again, and I promise that with time and experience you will get more comfortable with reactivity. Once you do feel more confident with reactivity, it may be a good idea to read more advanced documentation describing reactivity, since this section greatly simplifies ideas to make them more understandable. A great resource is RStudio's [tutorial on reactivity][shiny-reactivity]. +Reactivity is usually the hardest part about Shiny to understand, so if you don't quite get it, don't feel bad. Try reading this section again, and I promise that with time and experience you will get more comfortable with reactivity. Once you do feel more confident with reactivity, it may be a good idea to read more advanced documentation describing reactivity, since this section greatly simplifies ideas to make them more understandable. A great resource is RStudio's tutorial on reactivity, ["How to understand reactivity in R"]. Before continuing to the next section, you can remove all the `observe({})` and `reactive({})` functions we wrote in this section since they were all just for learning purposes. @@ -889,9 +893,9 @@ Remember how every single app is a web page powered by an R session on a compute ### Host on shinyapps.io -RStudio provides a service called [shinyapps.io][shinyapps-web] which lets you host your apps for free. It is integrated seamlessly into RStudio so that you can publish your apps with the click of a button, and it has a free version. The free version allows a certain number of apps per user and a certain number of activity on each app, but it should be good enough for most of you. It also lets you see some basic stats about usage of your app. +RStudio provides a service called [shinyapps.io] which lets you host your apps for free. It is integrated seamlessly into RStudio so that you can publish your apps with the click of a button, and it has a free version. The free version allows a certain number of apps per user and a certain number of activity on each app, but it should be good enough for most of you. It also lets you see some basic stats about usage of your app. -Hosting your app on shinyapps.io is the easy and recommended way of getting your app online. Go to [www.shinyapps.io][shinyapps-web] and sign up for an account. When you're ready to publish your app, click on the "Publish Application" button in RStudio and follow the instructions. You might be asked to install a couple packages if it's your first time. +Hosting your app on shinyapps.io is the easy and recommended way of getting your app online. Go to [shinyapps.io] and sign up for an account. When you're ready to publish your app, click on the "Publish Application" button in RStudio and follow the instructions. You might be asked to install a couple packages if it's your first time. ```{r echo = FALSE, out.width = "93%", fig.cap = "Shiny publish application button"} knitr::include_graphics("img/shiny-publish.png") @@ -902,9 +906,9 @@ After a successful deployment to shinyapps.io, you will be redirected to your ap ### Host on a Shiny Server -The other option for hosting your app is on your own private [Shiny Server][shiny-server]. Shiny Server is also a product by RStudio that lets you host apps on your own server. This means that instead of RStudio hosting the app for you, you have it on your own private server. This means you have a lot more freedom and flexibility, but it also means you need to have a server and be comfortable administering a server. I currently host all my apps on [my own Shiny Server](https://daattali.com/shiny/) just because I like having the extra control, but when I first learned about Shiny I used shinyapps.io for several months. +The other option for hosting your app is on your own private [Shiny Server](https://www.rstudio.com/products/shiny/shiny-server/). Shiny Server is also a product by RStudio that lets you host apps on your own server. This means that instead of RStudio hosting the app for you, you have it on your own private server. This means you have a lot more freedom and flexibility, but it also means you need to have a server and be comfortable administering a server. I currently host all my apps on [my own Shiny Server](https://daattali.com/shiny/) just because I like having the extra control, but when I first learned about Shiny I used shinyapps.io for several months. -If you're feeling adventurous and want to host your own server, you can follow [my tutorial for hosting a Shiny Server][shiny-server-setup]. +If you're feeling adventurous and want to host your own server, you can follow my tutorial for hosting a Shiny Server: ["How to get your very own RStudio Server and Shiny Server"]. ## More Shiny features to check out {#shiny-tutorial-14} @@ -913,7 +917,7 @@ Shiny is extremely powerful and has lots of features that we haven't covered. He ### Shiny in R Markdown -You can include Shiny inputs and outputs in an R Markdown document! This means that your R Markdown document can be interactive. Learn more [here][shiny-bookdown]. Here's a simple example of how to include interactive Shiny elements in an R Markdown: +You can include Shiny inputs and outputs in an R Markdown document! This means that your R Markdown document can be interactive. Learn more [here](https://bookdown.org/yihui/rmarkdown/shiny-documents.html). Here's a simple example of how to include interactive Shiny elements in an R Markdown: ````markdown `r xfun::file_string('supporting-docs/shiny.Rmd')` @@ -1016,7 +1020,7 @@ server <- function(input, output) { shinyApp(ui = ui, server = server) ``` -If you do want to add some JavaScript or use common JavaScript functions in your apps, you might want to check out [`shinyjs`][shinyjs-web]. +If you do want to add some JavaScript or use common JavaScript functions in your apps, you might want to check out [shinyjs]. ## Ideas to improve our app {#shiny-tutorial-15} @@ -1032,7 +1036,7 @@ The app we developed is functional, but there are plenty of improvements that ca - **Hint:** Place the image in a folder named `www`, and use `img(src = "imagename.png")` to add the image. 1. Share your app with everyone on the internet by deploying to shinyapps.io. - - **Hint:** Go to [shinyapps.io][shinyapps-web], register for an account, then click the "Publish App" button in RStudio. + - **Hint:** Go to [shinyapps.io], register for an account, then click the "Publish App" button in RStudio. 1. Use the DT package to turn the current results table into an interactive table. - **Hint:** Install the DT package, replace `tableOutput()` with `DT::dataTableOutput()` and replace `renderTable()` with `DT::renderDataTable()`. @@ -1070,6 +1074,12 @@ The app we developed is functional, but there are plenty of improvements that ca 1. Provide a way for the user to show results from *all* countries (instead of forcing a filter by only one specific country). - **Hint:** There are two ways to approach this. You can either add a value of "All" to the dropdown list of country options, you can include a checkbox for "Filter by country" and only show the dropdown. + +[Shiny tutorial]: https://shiny.rstudio.com/tutorial/ +[shinyapps.io]: https://www.shinyapps.io +["How to get your very own RStudio Server and Shiny Server"]: https://deanattali.com/2015/05/09/setup-rstudio-shiny-server-digital-ocean/ +["How to understand reactivity in R"]: https://shiny.rstudio.com/articles/understanding-reactivity.html +["Debugging Shiny applications"]: https://shiny.rstudio.com/articles/debugging.html ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/39_appendix.Rmd b/39_appendix.Rmd index 49a6b92..07c22c8 100644 --- a/39_appendix.Rmd +++ b/39_appendix.Rmd @@ -56,18 +56,18 @@ Figure \@ref(fig:draw-an-owl) is an image that is often used to illustrate how h I recently needed to draw a f\*cking owl in R, so I decided to record the process as an experiment. -When I teach [STAT545][stat-545] or [Software Carpentry][software-carpentry], I try to convey as much about *process* as anything else. You can always look up technical details, e.g., syntax, but you don't usually get to see how other people work. This is also how I approach teaching about [writing R functions](#functions-part1). Newcomers often look at finished code and assume it flowed perfectly formed out of someone's fingertips. It probably did not. +When I teach STAT 545 or [Software Carpentry](https://software-carpentry.org), I try to convey as much about *process* as anything else. You can always look up technical details, e.g., syntax, but you don't usually get to see how other people work. This is also how I approach teaching about [writing R functions](#functions-part1). Newcomers often look at finished code and assume it flowed perfectly formed out of someone's fingertips. It probably did not. My *modus operandi*: start with something that works and add features in small increments, maniacally checking that everything still works. Other people undoubtedly move faster (and, therefore, travel faster but crash harder), but I'm OK with that. ### Context: writing a function factory -I have an R package [`googlesheets`][googlesheets-github] that gets Google Sheets in and out of R. Lately we've had a lot of trouble with `Internal Server Error (HTTP 500)`, which, as you might expect, is an error on the Google server side. All you can do as a user is try, try again. But this is a showstopper for unattended scripts or multi-step operations, like building and checking the package. A single error renders lots of other work moot, which is completely infuriating. +I have an R package, [googlesheets], that gets Google Sheets in and out of R. Lately we've had a lot of trouble with `Internal Server Error (HTTP 500)`, which, as you might expect, is an error on the Google server side. All you can do as a user is try, try again. But this is a showstopper for unattended scripts or multi-step operations, like building and checking the package. A single error renders lots of other work moot, which is completely infuriating. I want to catch these errors and automatically retry the request after an appropriate delay. -The brute force approach would be to literally drop little `for` or `while` loops all over the package, to inspect the response and retry if necessary. But I try to follow the [DRY principle][wiki-dry], so would prefer to write a new "retry-capable" version of the function that makes these http requests. +The brute force approach would be to literally drop little `for` or `while` loops all over the package, to inspect the response and retry if necessary. But I try to follow the [DRY principle](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself), so would prefer to write a new "retry-capable" version of the function that makes these http requests. It also turns out there's more than one function for making these requests. I'm talking about the [HTTP verbs you use with REST APIs](https://www.restapitutorial.com/lessons/httpmethods.html): GET, POST, PATCH, etc. I potentially need to give them all the "retry" treatment. So what I really need is a *function factory*: an HTTP verb goes in and out comes a retry-capable version of the verb. @@ -75,7 +75,7 @@ It turns out you can write R (or S) for ~20 years and not be very facile with th ### Start at the beginning -My reference is the section of Wickham's [Advanced R][adv-r] [-@wickham2015a] that is about [closures][adv-r-closures], "functions written by functions". Here's one of the two main examples: a function that creates an exponentiation function. +My reference is the section of Wickham's [Advanced R] [-@wickham2015a] that is about [closures](http://adv-r.had.co.nz/Functional-programming.html#closures), "functions written by functions". Here's one of the two main examples: a function that creates an exponentiation function. ```{r} power <- function(exponent) { @@ -287,7 +287,7 @@ The final version of the function factory is about a dozen lines of fairly pedes ## How to obtain a bunch of GitHub issues or pull requests with R {#gh-package} -[Using dplyr + purrr + tidyr](https://github.com/jennybc/analyze-github-stuff-with-r) to analyze data about GitHub repos via the [gh package][gh-github] +[Using dplyr + purrr + tidyr](https://github.com/jennybc/analyze-github-stuff-with-r) to analyze data about GitHub repos via the [gh] package ## How to tame XML with nested data frames and purrr {#tame-google-sheets} @@ -475,7 +475,7 @@ AFAIK, to do that in a slick automatic way across an entire repo/site, you need ## How to send a bunch of emails from R {#email-in-r} -[Workflow](https://github.com/jennybc/send-email-with-r) for sending email with R and [`gmailr`](https://CRAN.R-project.org/package=gmailr). +[Workflow](https://github.com/jennybc/send-email-with-r) for sending email with R and [gmailr](https://cloud.R-project.org/package=gmailr). ## Store an API key as an environment variable {#store-api-key} @@ -510,7 +510,18 @@ Instructor dependencies: * curl if you execute the code to grab the Lord of the Rings data used in examples from GitHub. Note that the files are also included in the `datacarpentry/data/tidy-data/` directory, so data download is avoidable. * rmarkdown, knitr, and xtable if you want to compile the `Rmd` to `md` and `html`. +# Contributing Guide {#contributing} +## Link Reference Formatting + +If you anticipating the link being used again in another chapter, go ahead and put it in `links.md`. + +If it is a main package that you think will be used in multiple chapters, then include links in `links.md`. + +"main link" for a package priority order + 1. package website + 2. github home + 3. cran page ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/links.md b/links.md index c48c9e3..5ceff34 100644 --- a/links.md +++ b/links.md @@ -1,350 +1,122 @@ -[cran]: https://cloud.r-project.org -[cran-faq]: https://cran.r-project.org/faqs.html -[cran-R-admin]: http://cran.r-project.org/doc/manuals/R-admin.html -[cran-add-ons]: https://cran.r-project.org/doc/manuals/R-admin.html#Add_002don-packages -[r-proj]: https://www.r-project.org -[stat-545]: https://stat545.com -[software-carpentry]: https://software-carpentry.org -[cran-r-extensions]: https://cran.r-project.org/doc/manuals/r-release/R-exts.html - - - -[rstudio-preview]: https://www.rstudio.com/products/rstudio/download/preview/ -[rstudio-official]: https://www.rstudio.com/products/rstudio/#Desktop -[rstudio-workbench]: https://www.rstudio.com/wp-content/uploads/2014/04/rstudio-workbench.png -[rstudio-support]: https://support.rstudio.com/hc/en-us -[rstudio-R-help]: https://support.rstudio.com/hc/en-us/articles/200552336-Getting-Help-with-R -[rstudio-customizing]: https://support.rstudio.com/hc/en-us/articles/200549016-Customizing-RStudio -[rstudio-key-shortcuts]: https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts -[rstudio-command-history]: https://support.rstudio.com/hc/en-us/articles/200526217-Command-History -[rstudio-using-projects]: https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects -[rstudio-code-snippets]: https://support.rstudio.com/hc/en-us/articles/204463668-Code-Snippets -[rstudio-dplyr-cheatsheet-download]: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf -[rstudio-regex-cheatsheet]: https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf -[rstudio-devtools]: https://www.rstudio.com/products/rpackages/devtools/ - - -[happy-git]: https://happygitwithr.com -[hg-install-git]: https://happygitwithr.com/install-git.html -[hg-git-client]: https://happygitwithr.com/git-client.html -[hg-github-account]: https://happygitwithr.com/github-acct.html -[hg-install-r-rstudio]: https://happygitwithr.com/install-r-rstudio.html -[hg-connect-intro]: https://happygitwithr.com/connect-intro.html -[hg-browsability]: https://happygitwithr.com/workflows-browsability.html -[hg-shell]: https://happygitwithr.com/shell.html - - -[rmarkdown]: https://rmarkdown.rstudio.com -[knitr-faq]: https://yihui.name/knitr/faq/ - -[tidyverse-main-page]: https://www.tidyverse.org -[tidyverse-web]: https://tidyverse.tidyverse.org -[tidyverse-github]: https://github.com/hadley/tidyverse - -[dplyr-web]: https://dplyr.tidyverse.org -[dplyr-cran]: https://CRAN.R-project.org/package=dplyr -[dplyr-github]: https://github.com/hadley/dplyr -[dplyr-vignette-intro]: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html -[dplyr-vignette-window-fxns]: https://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html -[dplyr-vignette-two-table]: https://dplyr.tidyverse.org/articles/two-table.html - -[lubridate-web]: https://lubridate.tidyverse.org -[lubridate-cran]: https://CRAN.R-project.org/package=lubridate -[lubridate-github]: https://github.com/tidyverse/lubridate -[lubridate-vignette]: https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html - -[tidyr-web]: https://tidyr.tidyverse.org -[tidyr-cran]: https://CRAN.R-project.org/package=tidyr - -[readr-web]: https://readr.tidyverse.org -[readr-vignette-intro]: https://cran.r-project.org/web/packages/readr/vignettes/readr.html - -[stringr-web]: https://stringr.tidyverse.org -[stringr-cran]: https://CRAN.R-project.org/package=stringr - -[ggplot2-web]: https://ggplot2.tidyverse.org -[ggplot2-tutorial]: https://github.com/jennybc/ggplot2-tutorial -[ggplot2-reference]: https://docs.ggplot2.org/current/ -[ggplot2-cran]: https://CRAN.R-project.org/package=ggplot2 -[ggplot2-github]: https://github.com/tidyverse/ggplot2 -[ggplot2-theme-args]: https://ggplot2.tidyverse.org/reference/ggtheme.html#arguments - -[gapminder-web]: https://www.gapminder.org -[gapminder-cran]: https://CRAN.R-project.org/package=gapminder - -[assertthat-cran]: https://CRAN.R-project.org/package=assertthat -[assertthat-github]: https://github.com/hadley/assertthat - -[ensurer-cran]: https://CRAN.R-project.org/package=ensurer -[ensurer-github]: https://github.com/smbache/ensurer - -[assertr-cran]: https://CRAN.R-project.org/package=assertr -[assertr-github]: https://github.com/ropensci/assertr - -[assertive-cran]: https://CRAN.R-project.org/package=assertive -[assertive-bitbucket]: https://bitbucket.org/richierocks/assertive/src/master/ - -[testthat-cran]: https://CRAN.R-project.org/package=testthat -[testthat-github]: https://github.com/r-lib/testthat -[testthat-web]: https://testthat.r-lib.org - -[viridis-cran]: https://CRAN.R-project.org/package=viridis -[viridis-github]: https://github.com/sjmgarnier/viridis -[viridis-vignette]: https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html - -[colorspace-cran]: https://CRAN.R-project.org/package=colorspace -[colorspace-vignette]: https://cran.r-project.org/web/packages/colorspace/vignettes/hcl-colors.pdf - -[cowplot-cran]: https://CRAN.R-project.org/package=cowplot -[cowplot-github]: https://github.com/wilkelab/cowplot -[cowplot-vignette]: https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html - -[devtools-cran]: https://CRAN.R-project.org/package=devtools -[devtools-github]: https://github.com/r-lib/devtools -[devtools-web]: https://devtools.r-lib.org -[devtools-cheatsheet]: https://www.rstudio.com/wp-content/uploads/2015/03/devtools-cheatsheet.pdf -[devtools-cheatsheet-old]: https://rawgit.com/rstudio/cheatsheets/master/package-development.pdf -[devtools-1-6]: https://blog.rstudio.com/2014/10/02/devtools-1-6/ -[devtools-1-8]: https://blog.rstudio.com/2015/05/11/devtools-1-9-0/ -[devtools-1-9-1]: https://blog.rstudio.com/2015/09/13/devtools-1-9-1/ - -[googlesheets-cran]: https://CRAN.R-project.org/package=googlesheets -[googlesheets-github]: https://github.com/jennybc/googlesheets - -[tidycensus-cran]: https://CRAN.R-project.org/package=tidycensus -[tidycensus-github]: https://github.com/walkerke/tidycensus -[tidycensus-web]: https://walkerke.github.io/tidycensus/index.html - -[fs-web]: https://fs.r-lib.org/index.html -[fs-cran]: https://CRAN.R-project.org/package=fs -[fs-github]: https://github.com/r-lib/fs - -[plumber-web]: https://www.rplumber.io -[plumber-docs]: https://www.rplumber.io/docs/ -[plumber-github]: https://github.com/trestletech/plumber -[plumber-cran]: https://CRAN.R-project.org/package=plumber - -[plyr-web]: http://plyr.had.co.nz - -[magrittr-web]: https://magrittr.tidyverse.org -[forcats-web]: https://forcats.tidyverse.org -[glue-web]: https://glue.tidyverse.org -[stringi-cran]: https://CRAN.R-project.org/package=stringi -[rex-github]: https://github.com/kevinushey/rex -[rcolorbrewer-cran]: https://CRAN.R-project.org/package=RColorBrewer -[dichromat-cran]: https://CRAN.R-project.org/package=dichromat - -[rdryad-web]: https://docs.ropensci.org/rdryad/ -[rdryad-cran]: https://CRAN.R-project.org/package=rdryad -[rdryad-github]: https://github.com/ropensci/rdryad - -[roxygen2-cran]: https://CRAN.R-project.org/package=roxygen2 -[roxygen2-vignette]: https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html - -[shinythemes-web]: https://rstudio.github.io/shinythemes/ -[shinythemes-cran]: https://CRAN.R-project.org/package=shinythemes - -[shinyjs-web]: https://deanattali.com/shinyjs/ -[shinyjs-cran]: https://CRAN.R-project.org/package=shinyjs -[shinyjs-github]: https://github.com/daattali/shinyjs - -[leaflet-web]: https://rstudio.github.io/leaflet/ -[leaflet-cran]: https://CRAN.R-project.org/package=leaflet -[leaflet-github]: https://github.com/rstudio/leaflet - + +[useR-2014-dropbox]: https://www.dropbox.com/sh/i8qnluwmuieicxc/AAAgt9tIKoIm7WZKIyK25lh6a +[Tidy data using Lord of the Rings]: https://github.com/jennybc/lotr-tidy#readme +[ggplot2 tutorial]: https://github.com/jennybc/ggplot2-tutorial +[R Graph Catalog]: https://github.com/jennybc/r-graph-catalog + + +[dplyr]: https://dplyr.tidyverse.org +[tidyr]: https://tidyr.tidyverse.org +[ggplot2]: https://ggplot2.tidyverse.org +[tidyverse]: https://tidyverse.tidyverse.org +[stringr]: https://stringr.tidyverse.org +[forcats]: https://forcats.tidyverse.org +[purrr]: https://purrr.tidyverse.org +[readr]: https://readr.tidyverse.org +[fs]: https://fs.r-lib.org/index.html +[glue]: https://glue.tidyverse.org +[testthat]: https://testthat.r-lib.org +[ellipsis]: https://ellipsis.r-lib.org +[lubridate]: https://lubridate.tidyverse.org +[devtools]: https://devtools.r-lib.org +[roxygen2]: https://roxygen2.r-lib.org +[knitr]: https://github.com/yihui/knitr +[usethis]: https://usethis.r-lib.org +[xml2]: https://xml2.r-lib.org +[httr]: https://httr.r-lib.org +[rvest]: https://rvest.tidyverse.org +[Shiny]: https://shiny.rstudio.com +[gh]: https://github.com/r-lib/gh + +[plyr]: http://plyr.had.co.nz +[magrittr]: https://magrittr.tidyverse.org +[googlesheets]: https://github.com/jennybc/googlesheets +[gapminder]: https://github.com/jennybc/gapminder + + +[stringi]: http://www.gagolewski.com/software/stringi/ + +[rex]: https://github.com/kevinushey/rex +[lattice]: http://lattice.r-forge.r-project.org +[RColorBrewer]: https://cloud.r-project.org/package=RColorBrewer +[gridExtra]: https://cloud.r-project.org/package=gridExtra + +[rebird]: https://docs.ropensci.org/rebird/ +[geonames]: https://docs.ropensci.org/geonames/ +[rplos]: https://docs.ropensci.org/rplos/ +[gender]: https://docs.ropensci.org/gender/ +[genderdata]: https://docs.ropensci.org/genderdata/ +[curl]: https://jeroen.cran.dev/curl +[jsonlite]: https://github.com/jeroen/jsonlite + +[shinythemes]: https://rstudio.github.io/shinythemes/ +[shinyjs]: https://deanattali.com/shinyjs/ +[leaflet]: https://rstudio.github.io/leaflet/ [ggvis-web]: https://ggvis.rstudio.com -[ggvis-cran]: https://CRAN.R-project.org/package=ggvis - -[usethis-web]: https://usethis.r-lib.org -[usethis-cran]: https://CRAN.R-project.org/package=usethis -[usethis-github]: https://github.com/r-lib/usethis +[shinydashboard]: https://rstudio.github.io/shinydashboard/ -[pkgdown-web]: https://pkgdown.r-lib.org -[gh-github]: https://github.com/r-lib/gh -[httr-web]: https://httr.r-lib.org -[httr-cran]: https://CRAN.R-project.org/package=httr -[httr-github]: https://github.com/r-lib/httr - -[gistr-web]: https://docs.ropensci.org/gistr -[gistr-cran]: https://CRAN.R-project.org/package=gistr -[gistr-github]: https://github.com/ropensci/gistr - -[rvest-web]: https://rvest.tidyverse.org -[rvest-cran]: https://CRAN.R-project.org/package=rvest -[rvest-github]: https://github.com/tidyverse/rvest - -[xml2-web]: https://xml2.r-lib.org -[xml2-cran]: https://CRAN.R-project.org/package=xml2 -[xml2-github]: https://github.com/r-lib/xml2 - -[jsonlite-paper]: https://arxiv.org/abs/1403.2805 -[jsonlite-cran]: https://CRAN.R-project.org/package=jsonlite -[jsonlite-github]: https://github.com/jeroen/jsonlite - -[readxl-web]: https://readxl.tidyverse.org -[readxl-github]: https://github.com/tidyverse/readxl -[readxl-cran]: https://CRAN.R-project.org/package=readxl + +[dplyr-cran]: https://cloud.r-project.org/package=dplyr +[dplyr-github]: https://github.com/hadley/dplyr -[janitor-web]: http://sfirke.github.io/janitor/ -[janitor-cran]: https://CRAN.R-project.org/package=janitor -[janitor-github]: https://github.com/sfirke/janitor -[purrr-web]: https://purrr.tidyverse.org -[curl-cran]: https://CRAN.R-project.org/package=curl + +[Introduction to dplyr]: https://dplyr.tidyverse.org/articles/dplyr.html +[Window functions]: https://dplyr.tidyverse.org/articles/window-functions.html +[Two-table verbs]: https://dplyr.tidyverse.org/articles/two-table.html +[Do more with dates and times in R]: https://lubridate.tidyverse.org/articles/lubridate.html - -[shinydashboard-web]: https://rstudio.github.io/shinydashboard/ -[shinydashboard-cran]: https://CRAN.R-project.org/package=shinydashboard -[shinydashboard-github]: https://github.com/rstudio/shinydashboard + +[Happy Git and GitHub for the useR]: https://happygitwithr.com +[R for Data Science]: https://r4ds.had.co.nz +[The tidyverse style guide]: https://style.tidyverse.org +[Advanced R]: http://adv-r.had.co.nz +[Tidyverse design principles]: https://principles.tidyverse.org +[R Packages]: https://r-pkgs.org/index.html +[R Graphics Cookbook]: http://shop.oreilly.com/product/0636920023135.do +[Cookbook for R]: http://www.cookbook-r.com +[ggplot2: Elegant Graphics for Data Analysis]: https://ggplot2-book.org/index.html -[shiny-official-web]: https://shiny.rstudio.com -[shiny-official-tutorial]: https://shiny.rstudio.com/tutorial/ -[shiny-cheatsheet]: https://shiny.rstudio.com/images/shiny-cheatsheet.pdf -[shiny-articles]: https://shiny.rstudio.com/articles/ -[shiny-bookdown]: https://bookdown.org/yihui/rmarkdown/shiny-documents.html -[shiny-google-groups]: https://groups.google.com/forum/#!forum/shiny-discuss -[shiny-stack-overflow]: https://stackoverflow.com/questions/tagged/shiny -[shinyapps-web]: https://www.shinyapps.io -[shiny-server-setup]: https://deanattali.com/2015/05/09/setup-rstudio-shiny-server-digital-ocean/ -[shiny-reactivity]: https://shiny.rstudio.com/articles/understanding-reactivity.html -[shiny-debugging]: https://shiny.rstudio.com/articles/debugging.html -[shiny-server]: https://www.rstudio.com/products/shiny/shiny-server/ - -[adv-r]: http://adv-r.had.co.nz -[adv-r-fxns]: http://adv-r.had.co.nz/Functions.html -[adv-r-dsl]: http://adv-r.had.co.nz/dsl.html -[adv-r-defensive-programming]: http://adv-r.had.co.nz/Exceptions-Debugging.html#defensive-programming [adv-r-fxn-args]: http://adv-r.had.co.nz/Functions.html#function-arguments -[adv-r-return-values]: http://adv-r.had.co.nz/Functions.html#return-values -[adv-r-closures]: http://adv-r.had.co.nz/Functional-programming.html#closures - -[r4ds]: https://r4ds.had.co.nz [r4ds-transform]: https://r4ds.had.co.nz/transform.html -[r4ds-strings]: https://r4ds.had.co.nz/strings.html [r4ds-readr-strings]: https://r4ds.had.co.nz/data-import.html#readr-strings -[r4ds-dates-times]: https://r4ds.had.co.nz/dates-and-times.html -[r4ds-data-import]: http://r4ds.had.co.nz/data-import.html -[r4ds-relational-data]: https://r4ds.had.co.nz/relational-data.html -[r4ds-pepper-shaker]: https://r4ds.had.co.nz/vectors.html#lists-of-condiments -[r-pkgs2]: https://r-pkgs.org/index.html -[r-pkgs2-whole-game]: https://r-pkgs.org/whole-game.html -[r-pkgs2-description]: https://r-pkgs.org/description.html -[r-pkgs2-man]: https://r-pkgs.org/man.htm -[r-pkgs2-tests]: https://r-pkgs.org/tests.html -[r-pkgs2-namespace]: https://r-pkgs.org/namespace.html -[r-pkgs2-vignettes]: https://r-pkgs.org/vignettes.html -[r-pkgs2-release]: https://r-pkgs.org/release.html -[r-pkgs2-r-code]: https://r-pkgs.org/r.html#r + +[rOpenSci]: https://ropensci.org +[wiki-snake-case]: https://en.wikipedia.org/wiki/Snake_case +[Janus]: https://en.wikipedia.org/wiki/Janus -[r-graphics-cookbook]: http://shop.oreilly.com/product/0636920023135.do -[cookbook-for-r]: http://www.cookbook-r.com -[cookbook-for-r-graphs]: http://www.cookbook-r.com/Graphs/ -[cookbook-for-r-multigraphs]: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/ + +[RStudio Data Transformation Cheat Sheet]: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf +[Regular Expressions in R Cheat Sheet]: https://github.com/rstudio/cheatsheets/raw/master/regex.pdf +[Shiny Cheat Sheet]: https://shiny.rstudio.com/articles/cheatsheet.html -[elegant-graphics-springer]: https://www.springer.com/gp/book/9780387981413 -[testthat-article]: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf -[worry-about-color]: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=2ahUKEwi0xYqJ8JbjAhWNvp4KHViYDxsQFjABegQIABAC&url=https%3A%2F%2Fwww.researchgate.net%2Fprofile%2FAhmed_Elhattab2%2Fpost%2FPlease_suggest_some_good_3D_plot_tool_Software_for_surface_plot%2Fattachment%2F5c05ba35cfe4a7645506948e%2FAS%253A699894335557644%25401543879221725%2Fdownload%2FWhy%2BShould%2BEngineers%2Band%2BScientists%2BBe%2BWorried%2BAbout%2BColor_.pdf&usg=AOvVaw1qwjjGMd7h_z6TLUjzu7Nb -[escaping-rgbland-pdf]: https://eeecon.uibk.ac.at/~zeileis/papers/Zeileis+Hornik+Murrell-2009.pdf -[escaping-rgbland-doi]: https://doi.org/10.1016/j.csda.2008.11.033 + +["Dates and Times Made Easy with lubridate"]: https://www.jstatsoft.org/article/view/v040i03 +["testthat: Get Started with Testing"]: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf +["Let's Practice What We Preach"]: https://www.jstor.org/stable/3087382?seq=1#page_scan_tab_contents +[Creating More Effective Graphs]: https://www.amazon.com/Creating-Effective-Graphs-Naomi-Robbins/dp/0985911123 +["Escaping RGBland: Selecting Colors for Statistical Graphs"]: https://eeecon.uibk.ac.at/~zeileis/papers/Zeileis+Hornik+Murrell-2009.pdf +["A layered grammar of graphics"]: https://vita.had.co.nz/papers/layered-grammar.html +[Managing Projects with GNU Make, 3rd Edition]: http://shop.oreilly.com/product/9780596006105.do - -[rdocs-extremes]: https://rdrr.io/r/base/Extremes.html -[rdocs-range]: https://rdrr.io/r/base/range.html -[rdocs-quantile]: https://rdrr.io/r/stats/quantile.html -[rdocs-c]: https://rdrr.io/r/base/c.html -[rdocs-list]: https://rdrr.io/r/base/list.html -[rdocs-lm]: https://rdrr.io/r/stats/lm.html -[rdocs-coef]: https://rdrr.io/r/stats/coef.html -[rdocs-devices]: https://rdrr.io/r/grDevices/Devices.html -[rdocs-ggsave]: https://rdrr.io/cran/ggplot2/man/ggsave.html -[rdocs-dev]: https://rdrr.io/r/grDevices/dev.html + +["Let the Data Flow: Pipelines in R with dplyr and magrittr"]: https://github.com/tjmahr/MadR_Pipelines +["Hands-on dplyr tutorial for faster data manipulation in R"]: https://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/ +["Writing R Extensions"]: https://cloud.r-project.org/doc/manuals/r-release/R-exts.html +["The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"]: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ +["What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text"]: http://kunststube.net/encoding/ +["Guide to fixing encoding problems in Ruby"]: https://www.justinweiss.com/articles/3-steps-to-fix-encoding-problems-in-ruby/ +["My favorite RGB color"]: https://manyworldstheory.com/2013/01/15/my-favorite-rgb-color/ +["Why Should Engineers and Scientists Be Worried About Color?"]: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=2ahUKEwi0xYqJ8JbjAhWNvp4KHViYDxsQFjABegQIABAC&url=https%3A%2F%2Fwww.researchgate.net%2Fprofile%2FAhmed_Elhattab2%2Fpost%2FPlease_suggest_some_good_3D_plot_tool_Software_for_surface_plot%2Fattachment%2F5c05ba35cfe4a7645506948e%2FAS%253A699894335557644%25401543879221725%2Fdownload%2FWhy%2BShould%2BEngineers%2Band%2BScientists%2BBe%2BWorried%2BAbout%2BColor_.pdf&usg=AOvVaw1qwjjGMd7h_z6TLUjzu7Nb + - -[wiki-snake-case]: https://en.wikipedia.org/wiki/Snake_case -[wiki-hello-world]: https://en.wikipedia.org/wiki/%22Hello,_world!%22_program -[wiki-janus]: https://en.wikipedia.org/wiki/Janus -[wiki-nesting-dolls]: https://en.wikipedia.org/wiki/Matryoshka_doll -[wiki-pure-fxns]: https://en.wikipedia.org/wiki/Pure_function -[wiki-camel-case]: https://en.wikipedia.org/wiki/Camel_case -[wiki-mojibake]: https://en.wikipedia.org/wiki/Mojibake -[wiki-row-col-major-order]: https://en.wikipedia.org/wiki/Row-_and_column-major_order -[wiki-boxplot]: https://en.wikipedia.org/wiki/Box_plot -[wiki-brewer]: https://en.wikipedia.org/wiki/Cynthia_Brewer -[wiki-vector-graphics]: https://en.wikipedia.org/wiki/Vector_graphics -[wiki-raster-graphics]: https://en.wikipedia.org/wiki/Raster_graphics -[wiki-dry]: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself -[wiki-web-scraping]: https://en.wikipedia.org/wiki/Web_scraping -[wiki-xpath]: https://en.wikipedia.org/wiki/XPath -[wiki-css-selector]: https://en.wikipedia.org/wiki/Cascading_Style_Sheets#Selector - -[split-apply-combine]: https://www.jstatsoft.org/article/view/v040i01 -[useR-2014-dropbox]: https://www.dropbox.com/sh/i8qnluwmuieicxc/AAAgt9tIKoIm7WZKIyK25lh6a -[gh-pages]: https://pages.github.com -[html-preview]: http://htmlpreview.github.io -[tj-mahr-slides]: https://github.com/tjmahr/MadR_Pipelines -[dataschool-dplyr]: https://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/ -[xckd-randall-munroe]: https://fivethirtyeight.com/features/xkcd-randall-munroe-qanda-what-if/ -[athena-zeus-forehead]: https://tinyurl.com/athenaforehead -[tidydata-lotr]: https://github.com/jennybc/lotr-tidy#readme -[minimal-make]: https://kbroman.org/minimal_make/ -[write-data-tweet]: https://twitter.com/vsbuffalo/statuses/358699162679787521 -[belt-and-suspenders]: https://www.wisegeek.com/what-does-it-mean-to-wear-belt-and-suspenders.htm -[research-workflow]: https://www.carlboettiger.info/2012/05/06/research-workflow.html -[yak-shaving]: https://seths.blog/2005/03/dont_shave_that/ -[yaml-with-csv]: https://blog.datacite.org/using-yaml-frontmatter-with-csv/ -[reproducible-examples]: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example -[blog-strings-as-factors]: https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh -[bio-strings-as-factors]: https://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography -[stackexchange-outage]: https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016 -[email-regex]: https://emailregex.com -[fix-atom-bug]: https://davidvgalbraith.com/how-i-fixed-atom/ -[icu-regex]: https://userguide.icu-project.org/strings/regexp -[regex101]: https://regex101.com -[regexr]: https://regexr.com -[utf8-debug]: http://www.i18nqa.com/debug/utf8-debug.html -[unicode-no-excuses]: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ -[programmers-encoding]: http://kunststube.net/encoding/ -[encoding-probs-ruby]: https://www.justinweiss.com/articles/3-steps-to-fix-encoding-problems-in-ruby/ -[theyre-to-theyre]: https://www.justinweiss.com/articles/how-to-get-from-theyre-to-theyre/ -[lubridate-ex1]: https://www.r-exercises.com/2016/08/15/dates-and-times-simple-and-easy-with-lubridate-part-1/ -[lubridate-ex2]: https://www.r-exercises.com/2016/08/29/dates-and-times-simple-and-easy-with-lubridate-exercises-part-2/ -[lubridate-ex3]: https://www.r-exercises.com/2016/10/04/dates-and-times-simple-and-easy-with-lubridate-exercises-part-3/ -[google-sql-join]: https://www.google.com/search?q=sql+join&tbm=isch -[min-viable-product]: https://blog.fastmonkeys.com/?utm_content=bufferc2d6e&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer -[telescope-rule]: http://c2.com/cgi/wiki?TelescopeRule -[unix-philosophy]: http://www.faqs.org/docs/artu/ch01s06.html -[twitter-wrathematics]: https://twitter.com/wrathematics -[robbins-effective-graphs]: https://www.amazon.com/Creating-Effective-Graphs-Naomi-Robbins/dp/0985911123 -[r-graph-catalog-github]: https://github.com/jennybc/r-graph-catalog -[google-pie-charts]: https://www.google.com/search?q=pie+charts+suck -[why-pie-charts-suck]: https://www.richardhollins.com/blog/why-pie-charts-suck/ -[worst-figure]: https://robjhyndman.com/hyndsight/worst-figure/ -[naomi-robbins]: http://www.nbr-graphs.com -[hadley-github-index]: https://hadley.github.io -[scipy-2015-matplotlib-colors]: https://www.youtube.com/watch?v=xAoljeRJ3lU&feature=youtu.be -[winston-chang-github]: https://github.com/wch -[favorite-rgb-color]: https://manyworldstheory.com/2013/01/15/my-favorite-rgb-color/ -[stowers-color-chart]: https://web.archive.org/web/20121022044903/http://research.stowers-institute.org/efg/R/Color/Chart/ -[stowers-using-color-in-R]: https://www.uv.es/conesa/CursoR/material/UsingColorInR.pdf -[zombie-project]: https://imgur.com/ewmBeQG -[tweet-project-resurfacing]: https://twitter.com/JohnDCook/status/522377493417033728 -[rgraphics-looks-tips]: https://blog.revolutionanalytics.com/2009/01/10-tips-for-making-your-r-graphics-look-their-best.html -[rgraphics-svg-tips]: https://blog.revolutionanalytics.com/2011/07/r-svg-graphics.html -[zev-ross-cheatsheet]: http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/ -[parker-writing-r-packages]: https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/ -[broman-r-packages]: https://kbroman.org/pkg_primer/ -[broman-tools4rr]: https://kbroman.org/Tools4RR/ -[leeks-r-packages]: https://github.com/jtleek/rpackages -[build-maintain-r-packages]: https://thepoliticalmethodologist.com/2014/08/14/building-and-maintaining-r-packages-with-devtools-and-roxygen2/ -[murdoch-package-vignette-slides]: https://web.archive.org/web/20160824010213/http://www.stats.uwo.ca/faculty/murdoch/ism2013/5Vignettes.pdf -[how-r-searches]: http://blog.obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/ diff --git a/supporting-docs/foofactors-README.Rmd b/supporting-docs/foofactors-README.Rmd index 33f7460..4271152 100644 --- a/supporting-docs/foofactors-README.Rmd +++ b/supporting-docs/foofactors-README.Rmd @@ -14,7 +14,7 @@ knitr::opts_chunk$set( ) ``` -**NOTE: This is a toy package created for expository purposes. It is not meant to actually be useful. If you want a package for factor handling, please see [forcats](https://cran.r-project.org/package=forcats).** +**NOTE: This is a toy package created for expository purposes. It is not meant to actually be useful. If you want a package for factor handling, please see [forcats](https://cloud.r-project.org/package=forcats).** ### foofactors From f6557eb501169154dd6e70d353e2e06145ed64f5 Mon Sep 17 00:00:00 2001 From: Grace Lawley Date: Thu, 10 Oct 2019 00:26:43 -0700 Subject: [PATCH 2/3] close read through, round 2 --- 02_r-basics.Rmd | 6 +-- 05_data-care-feeding.Rmd | 4 +- 06_dplyr-intro.Rmd | 8 +-- 07_dplyr-single-table.Rmd | 17 +++--- 09_import-export.Rmd | 18 +++---- 10_factors.Rmd | 12 ++--- 11_character-vectors.Rmd | 16 +++--- 12_character-encoding.Rmd | 4 +- 13_date-times.Rmd | 2 +- 14_multiple-tibbles.Rmd | 4 +- 15_join-tibbles.Rmd | 2 +- 17_r-objects-indexing.Rmd | 4 +- 18_functions-part1.Rmd | 10 ++-- 20_functions-part3.Rmd | 2 +- 21_functions-practicum.Rmd | 4 +- 24_effective-graphs.Rmd | 10 ++-- 25_colors.Rmd | 9 ++-- 26_qualitative-colors.Rmd | 4 +- 27_secrets-happy-graphics.Rmd | 4 +- 28_saving-figures.Rmd | 18 +++---- 29_multiple-plots.Rmd | 2 +- 30_package-overview.Rmd | 2 +- 32_system-prep-packages.Rmd | 2 +- 33_create-package.Rmd | 3 ++ 34_workflows.Rmd | 97 ++++++++++++++++++----------------- 39_appendix.Rmd | 41 +++++++++++++++ links.md | 33 +++--------- 27 files changed, 186 insertions(+), 152 deletions(-) diff --git a/02_r-basics.Rmd b/02_r-basics.Rmd index 3177243..4ff1508 100644 --- a/02_r-basics.Rmd +++ b/02_r-basics.Rmd @@ -13,10 +13,10 @@ Launch RStudio/R. Notice the default panes: * Console (entire left) -* Environment/History (tabbed in upper right) -* Files/Plots/Packages/Help (tabbed in lower right) +* Environment / History (tabbed in upper right) +* Files / Plots / Packages / Help (tabbed in lower right) -FYI: you can change the default location of the panes, among many other things: [Customizing RStudio](https://support.rstudio.com/hc/en-us/articles/200549016-Customizing-RStudio). +__FYI:__ you can change the default location of the panes, among many other things: [Customizing RStudio](https://support.rstudio.com/hc/en-us/articles/200549016-Customizing-RStudio). Go into the Console, where we interact with the live R process. diff --git a/05_data-care-feeding.Rmd b/05_data-care-feeding.Rmd index de0a1cc..94bf7d7 100644 --- a/05_data-care-feeding.Rmd +++ b/05_data-care-feeding.Rmd @@ -24,7 +24,7 @@ Now restart R. This will ensure you don't have any packages loaded from previous Why do we do this? So that the code you write is complete and re-runnable. If you return to a clean slate often, you will root out hidden dependencies where one snippet of code only works because it relies on objects created by code saved elsewhere or, much worse, never saved at all. Similarly, an aggressive clean slate approach will expose any usage of packages that have not been explicitly loaded. -Finally, open a new R script and develop and run your code from there. In RStudio, use *File > New File > R Script*. Save this script with a name ending in `.r` or `.R`, containing no spaces or other funny stuff, and that evokes whatever it is we're doing today. __Example:__ `cm004_data-care-feeding.r`. +Finally, open a new R script and develop and run your code from there. In RStudio, use *File > New File > R Script*. Save this script with a name ending in `.r` or `.R`, containing no spaces or other funny stuff, and that evokes whatever it is we're doing today. Example: `cm004_data-care-feeding.r`. Another great idea is to do this in an R Markdown document. See [Test drive R Markdown](#r-markdown) for a refresher. @@ -66,7 +66,7 @@ str(gapminder) We could print the `gapminder` object itself to screen. However, if you've used R before, you might be reluctant to do this, because large datasets just fill up your Console and provide very little insight. -This is the first big win for **tibbles**. The [tidyverse] offers a special case of R's default data frame: the "tibble", which is a nod to the actual class of these objects, `tbl_df`. +This is the first big win for **tibbles**. The tidyverse offers a special case of R's default data frame: the "tibble", which is a nod to the actual class of these objects, `tbl_df`. If you have not already done so, install the tidyverse meta-package now: diff --git a/06_dplyr-intro.Rmd b/06_dplyr-intro.Rmd index 7238a47..3850f52 100644 --- a/06_dplyr-intro.Rmd +++ b/06_dplyr-intro.Rmd @@ -22,7 +22,7 @@ I choose to load the tidyverse, which will load dplyr, among other packages we u library(tidyverse) ``` -Also load gapminder. +Also load [gapminder]. ```{r message = FALSE, warning = FALSE} library(gapminder) @@ -64,7 +64,7 @@ Stop and ask yourself ... > Do I want to create mini datasets for each level of some factor (or unique combination of several factors) ... in order to compute or graph something? -If YES, __use proper data aggregation techniques__ or faceting in ggplot2 -- __don’t subset the data__. Or, more realistic, only subset the data as a temporary measure while you develop your elegant code for computing on or visualizing these data subsets. +If YES, __use proper data aggregation techniques__ or faceting in [ggplot2] -- __don’t subset the data__. Or, more realistic, only subset the data as a temporary measure while you develop your elegant code for computing on or visualizing these data subsets. If NO, then maybe you really do need to store a copy of a subset of the data. But seriously consider whether you can achieve your goals by simply using the `subset =` argument of, e.g., the `lm()` function, to limit computation to your excerpt of choice. Lots of functions offer a `subset =` argument! @@ -146,7 +146,7 @@ gapminder %>% head(4) ``` -Think: "Take `gapminder`, then select the variables year and lifeExp, then show the first 4 rows." +Think: "Take `gapminder`, then select the variables `year` and `lifeExp`, then show the first 4 rows." ## Revel in the convenience @@ -200,7 +200,7 @@ Go to the next Chapter, [dplyr functions for a single dataset](#dplyr-single), f Blog post ["Hands-on dplyr tutorial for faster data manipulation in R"] by Data School, that includes a link to an R Markdown document and links to videos. -Chapter \@ref(join-cheatsheet): cheatsheet I made for dplyr join functions (not relevant yet but soon). +Chapter \@ref(join-cheatsheet) - cheatsheet I made for dplyr join functions (not relevant yet but soon). ```{r links, child="links.md"} diff --git a/07_dplyr-single-table.Rmd b/07_dplyr-single-table.Rmd index 3d767e4..15d9a13 100644 --- a/07_dplyr-single-table.Rmd +++ b/07_dplyr-single-table.Rmd @@ -8,7 +8,7 @@ source("common.R") ## Where were we? -In Chapter \@ref(dplyr-intro), [Introduction to dplyr](#dplyr-intro), we used two very important verbs and an operator: +In Chapter \@ref(dplyr-intro), Introduction to dplyr, we used two very important verbs and an operator: * `filter()` for subsetting data with row logic * `select()` for subsetting data variable- or column-wise @@ -16,7 +16,7 @@ In Chapter \@ref(dplyr-intro), [Introduction to dplyr](#dplyr-intro), we used tw We also discussed dplyr's role inside the tidyverse and tibbles: -* dplyr is a core package in the [tidyverse] meta-package. Since we often make incidental usage of the others, we will load dplyr and the others via `library(tidyverse)`. +* [dplyr] is a core package in the [tidyverse] meta-package. Since we often make incidental usage of the others, we will load dplyr and the others via `library(tidyverse)`. * The tidyverse embraces a special flavor of data frame, called a tibble. The `gapminder` dataset is stored as a tibble. ## Load dplyr and gapminder @@ -27,7 +27,8 @@ I choose to load the tidyverse, which will load dplyr, among other packages we u library(tidyverse) ``` -Also load `gapminder.` +Also load [gapminder]. + ```{r message = FALSE, warning = FALSE} library(gapminder) ``` @@ -68,7 +69,7 @@ my_gap %>% Hmmmm ... those GDP numbers are almost uselessly large and abstract. Consider the [advice of Randall Munroe of xkcd](https://fivethirtyeight.com/features/xkcd-randall-munroe-qanda-what-if/): ->One thing that bothers me is large numbers presented without context... 'If I added a zero to this number, would the sentence containing it mean something different to me?' If the answer is 'no,' maybe the number has no business being in the sentence in the first place." +>One thing that bothers me is large numbers presented without context... "If I added a zero to this number, would the sentence containing it mean something different to me?" If the answer is "no", maybe the number has no business being in the sentence in the first place. Maybe it would be more meaningful to consumers of my tables and figures to stick with GDP per capita. But what if I reported GDP per capita, *relative to some benchmark country*. Since Canada is my adopted home, I'll go with that. @@ -172,15 +173,19 @@ my_gap %>% ## `group_by()` is a mighty weapon -I have found ~~friends and family~~ collaborators love to ask seemingly innocuous questions like, "which country experienced the sharpest 5-year drop in life expectancy?". In fact, that is a totally natural question to ask. But if you are using a language that doesn't know about data, it's an incredibly annoying question to answer. +I have found that ~~friends and family~~ collaborators love to ask seemingly innocuous questions like, "which country experienced the sharpest 5-year drop in life expectancy?". In fact, that is a totally natural question to ask. But if you are using a language that doesn't know about data, it's an incredibly annoying question to answer. dplyr offers powerful tools to solve this class of problem: * `group_by()` adds extra structure to your dataset -- grouping information -- which lays the groundwork for computations within the groups. + * `summarize()` takes a dataset with $n$ observations, computes requested summaries, and returns a dataset with 1 observation. + * Window functions take a dataset with $n$ observations and return a dataset with $n$ observations. + * `mutate()` and `summarize()` will honor groups. + * You can also do very general computations on your groups with `do()`, though elsewhere in this course, I advocate for other approaches that I find more intuitive, using the [purrr] package. Combined with the verbs you already know, these new tools allow you to solve an extremely diverse set of problems with relative ease. @@ -386,7 +391,7 @@ In later tutorials, we'll explore more of dplyr, such as operations based on two Blog post ["Hands-on dplyr tutorial for faster data manipulation in R"] by Data School, that includes a link to an R Markdown document and links to videos. -Chapter \@ref(join-cheatsheet): cheatsheet I made for dplyr join functions (not relevant yet but soon). +Chapter \@ref(join-cheatsheet) - cheatsheet I made for dplyr join functions (not relevant yet but soon). ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/09_import-export.Rmd b/09_import-export.Rmd index 67c7582..a5c7caf 100644 --- a/09_import-export.Rmd +++ b/09_import-export.Rmd @@ -16,10 +16,10 @@ How do you do this? What issues should you think about? Data import generally feels one of two ways: -* "Surprise me!" This is the attitude you must adopt when you first get a dataset. You are just happy to import without an error. You start to explore. You discover flaws in the data and/or the import. You address them. Lather, rinse, repeat. -* "Another day in paradise." This is the attitude when you bring in a tidy dataset you have maniacally cleaned in one or more cleaning scripts. There should be no surprises. You should express your expectations about the data in formal assertions at the very start of these downstream scripts. +* *"Surprise me!"* This is the attitude you must adopt when you first get a dataset. You are just happy to import without an error. You start to explore. You discover flaws in the data and/or the import. You address them. Lather, rinse, repeat. +* *"Another day in paradise."* This is the attitude when you bring in a tidy dataset you have maniacally cleaned in one or more cleaning scripts. There should be no surprises. You should express your expectations about the data in formal assertions at the very start of these downstream scripts. -In the second case, and as the first cases progresses, you actually know a lot about how the data is / should be. My main import advice: **use the arguments of your import function to get as far as you can, as fast as possible**. Novice code often has a great deal of unnecessary post import fussing around. Read the docs for the import functions and take maximum advantage of the arguments to control the import. +In the second case, and as the first cases progresses, you actually know a lot about how the data is/should be. My main import advice: **use the arguments of your import function to get as far as you can, as fast as possible**. Novice code often has a great deal of unnecessary post import fussing around. Read the docs for the import functions and take maximum advantage of the arguments to control the import. ### Data export mindset @@ -32,11 +32,11 @@ First tip: __today's outputs are tomorrow's inputs__. Think back on all the pain Second tip: don't be too cute or clever. A plain text file that is readable by a human being in a text editor should be your default until you have __actual proof__ that this will not work. Reading and writing to exotic or proprietary formats will be the first thing to break in the future or on a different computer. It also creates barriers for anyone who has a different toolkit than you do. Be software-agnostic. Aim for future-proof and moron-proof. -How does this fit with our emphasis on dynamic reporting via R Markdown? There is a time and place for everything. There are projects and documents where the scope and personnel will allow you to geek out with knitr and R Markdown. But there are lots of good reasons why (parts of) an analysis should not (only) be embedded in a dynamic report. Maybe you are just doing data cleaning to produce a valid input dataset. Maybe you are making a small but crucial contribution to a giant multi-author paper. Etc. Also remember there are other tools and workflows for making something reproducible. I'm looking at you, [make](https://kbroman.org/minimal_make/). +How does this fit with our emphasis on dynamic reporting via R Markdown? There is a time and place for everything. There are projects and documents where the scope and personnel will allow you to geek out with knitr and R Markdown. But there are lots of good reasons why (parts of) an analysis should not (only) be embedded in a dynamic report. Maybe you are just doing data cleaning to produce a valid input dataset. Maybe you are making a small but crucial contribution to a giant multi-author paper. Etc. Also remember there are other tools and workflows for making something reproducible. I'm looking at you, [make]["minimal make: a minimal tutorial on make"]. ## Load the tidyverse -The main package we will be using is [readr], which provides drop-in substitute functions for `read.table()` and friends. However, to make some points about data export and import, it is nice to reorder factor levels. For that, we will use the [forcats] package, which is also included in the [tidyverse] package. +The main package we will be using is [readr], which provides drop-in substitute functions for `read.table()` and friends. However, to make some points about data export and import, it is nice to reorder factor levels. For that, we will use the [forcats] package, which is also included in the [tidyverse] meta-package. ```{r start_import_export} library(tidyverse) @@ -62,7 +62,7 @@ str(gapminder, give.attr = FALSE) For full flexibility re: specifying the delimiter, you can always use `readr::read_delim()`. -There's a similar convenience wrapper for comma-separated values, `read_csv()`. +There's a similar convenience wrapper for comma-separated values: `read_csv()`. The most noticeable difference between the readr functions and base is that readr does NOT convert strings to factors by default. In the grand scheme of things, this is better default behavior, although we go ahead and convert them to factor here. Do not be deceived -- in general, you will do less post-import fussing if you use readr. @@ -122,11 +122,11 @@ It turns out these self-imposed rules are often in conflict with one another: * Be the boss of factors: order the levels in a meaningful, usually non-alphabetical way * Avoid duplication of code and data -__Example:__ after performing the country-level summarization, we reorder the levels of the country factor, based on life expectancy. This reordering operation is conceptually important and must be embodied in R commands stored in a script. However, as soon as we write `gap_life_exp` to a plain text file, that meta-information about the countries is lost. Upon re-import with `read_delim()` and friends, we are back to alphabetically ordered factor levels. Any measure we take to avoid this loss immediately breaks another one of our rules. +Example: after performing the country-level summarization, we reorder the levels of the country factor, based on life expectancy. This reordering operation is conceptually important and must be embodied in R commands stored in a script. However, as soon as we write `gap_life_exp` to a plain text file, that meta-information about the countries is lost. Upon re-import with `read_delim()` and friends, we are back to alphabetically ordered factor levels. Any measure we take to avoid this loss immediately breaks another one of our rules. So what do I do? I must admit I save (and re-load) R-specific binary files. Right after I save the plain text file. [Belt and suspenders](https://www.wisegeek.com/what-does-it-mean-to-wear-belt-and-suspenders.htm). -I have toyed with the idea of writing import helper functions for a specific project, that would re-order factor levels in principled ways. They could be defined in one file and called from many. This would also have a very natural implementation within [a workflow where each analytical project is an R package](https://www.carlboettiger.info/2012/05/06/research-workflow.html). But so far it has seemed too much like [yak shaving](https://seths.blog/2005/03/dont_shave_that/). I'm intrigued by a recent discussion of putting such information in YAML frontmatter (see Martin Fenner blog post [Using YAML frontmatter with CSV](https://blog.datacite.org/using-yaml-frontmatter-with-csv/)). +I have toyed with the idea of writing import helper functions for a specific project, that would re-order factor levels in principled ways. They could be defined in one file and called from many. This would also have a very natural implementation within [a workflow where each analytical project is an R package](https://www.carlboettiger.info/2012/05/06/research-workflow.html). But so far it has seemed too much like [yak shaving](https://seths.blog/2005/03/dont_shave_that/). I'm intrigued by a recent discussion of putting such information in YAML frontmatter (see Martin Fenner blog post, ["Using YAML frontmatter with CSV"](https://blog.datacite.org/using-yaml-frontmatter-with-csv/)). ## Reordering the levels of the country factor @@ -237,7 +237,7 @@ If a delimited file contains fields where a human being has typed, be crazy para When the header fields (often, but not always, the variable names) or actual data contain the delimiter, it can lead to parsing and import failures. Two popular delimiters are the comma `,` and the TAB `\t` and humans tend to use these when typing. If you can design this problem away during data capture, such as by using a drop down menu on an input form, by all means do so. Sometimes this is impossible or undesirable and you must deal with fairly free form text. That's a good time to allow/force text to be protected with quotes, because it will make parsing the delimited file go more smoothly. -Sometimes, instead of rigid tab-delimiting, whitespace is used as the delimiter. That is, in fact, the default for both `read.table()` and `write.table()`. Assuming you will write/read variable names from the first line (a.k.a. the `header` in `write.table()` and `read.table()`), they must be valid R variable names ... or they will be coerced into something valid. So, for these two reasons, it is good practice to use "one word" variable names whenever possible. If you need to evoke multiple words, use `snake_case` or `camelCase` to cope. __Example:__ the header entry for the field holding the subject's last name should be `last_name` or `lastName` NOT `last name`. With the readr package, "column names are left as is, not munged into valid R identifiers (i.e. there is no `check.names = TRUE`)". So you can get away with whitespace in variable names and yet I recommend that you do not. +Sometimes, instead of rigid tab-delimiting, whitespace is used as the delimiter. That is, in fact, the default for both `read.table()` and `write.table()`. Assuming you will write/read variable names from the first line (a.k.a. the `header` in `write.table()` and `read.table()`), they must be valid R variable names ... or they will be coerced into something valid. So, for these two reasons, it is good practice to use "one word" variable names whenever possible. If you need to evoke multiple words, use `snake_case` or `camelCase` to cope. Example: the header entry for the field holding the subject's last name should be `last_name` or `lastName` NOT `last name`. With the readr package, "column names are left as is, not munged into valid R identifiers (i.e. there is no `check.names = TRUE`)". So you can get away with whitespace in variable names and yet I recommend that you do not. ## Resources diff --git a/10_factors.Rmd b/10_factors.Rmd index 95af3d0..0665f34 100644 --- a/10_factors.Rmd +++ b/10_factors.Rmd @@ -22,14 +22,14 @@ Where do stealth factors come from? Base R has a burning desire to turn characte Good articles about how the factor fiasco came to be: -* [stringsAsFactors: An unauthorized biography](https://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography) by Roger Peng -* [stringsAsFactors = \](https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) by Thomas Lumley +* ["stringsAsFactors: An unauthorized biography"](https://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography) by Roger Peng +* ["stringsAsFactors = \"](https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) by Thomas Lumley ## The forcats package [forcats] is a core package in the [tidyverse]. It is installed via `install.packages("tidyverse")`, and loaded with `library(tidyverse)`. You can also install via `install.packages("forcats")`and load it yourself separately as needed via `library(forcats)`. Main functions start with `fct_`. There really is no coherent family of base functions that forcats replaces -- that's why it's such a welcome addition. -Currently this lesson will be mostly code vs prose. See the previous lesson for more discussion during the transition. +Currently this lesson will be mostly code vs. prose. See the previous lesson for more discussion during the transition. ## Load forcats and gapminder @@ -39,7 +39,7 @@ I choose to load the tidyverse, which will load forcats, among other packages we library(tidyverse) ``` -Also load gapminder. +Also load [gapminder]. ```{r message = FALSE, warning = FALSE} library(gapminder) @@ -120,9 +120,9 @@ low_pop %>% By default, factor levels are ordered alphabetically. Which might as well be random, when you think about it! It is preferable to order the levels according to some principle: * Frequency. Make the most common level the first and so on. -* Another variable. Order factor levels according to a summary statistic for another variable. __Example:__ order Gapminder countries by life expectancy. +* Another variable. Order factor levels according to a summary statistic for another variable. Example: order Gapminder countries by life expectancy. -First, let's order continent by frequency, forwards and backwards. This is often a great idea for tables and figures, esp. frequency barplots. +First, let's order continent by frequency, forwards and backwards. This is often a great idea for tables and figures, especially frequency barplots. ```{r} ## default order is alphabetical diff --git a/11_character-vectors.Rmd b/11_character-vectors.Rmd index 535931d..e797f16 100644 --- a/11_character-vectors.Rmd +++ b/11_character-vectors.Rmd @@ -40,7 +40,7 @@ A God-awful and powerful language for expressing patterns to match in text or fo * We again prefer the [stringr] package over base functions. Why? - Wraps [stringi], which is a great place to look if stringr isn't powerful enough. - - Standardized on [ICU regular expressions](https://userguide.icu-project.org/strings/regexp), so you can stop toggling `perl = TRUE/FALSE` at random. + - Standardized on [ICU regular expressions](http://userguide.icu-project.org/strings/regexp), so you can stop toggling `perl = TRUE/FALSE` at random. - Results come back in a form that is much friendlier for downstream work. * The [Strings chapter] of [R for Data Science] [@wickham2016] is a great resource. * Older STAT 545 lessons on regular expressions have some excellent content. This lesson draws on them, but makes more rigorous use of stringr and uses example data that is easier to support long-term. @@ -59,11 +59,11 @@ A God-awful and powerful language for expressing patterns to match in text or fo * Screeds on the Minimum Everyone Needs to Know about encoding: - ["The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"] - ["What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text"] -* Chapter \@ref(character-encoding) - I've translated this blog post [Guide to fixing encoding problems in Ruby] into R as the first step to developing a lesson. +* Chapter \@ref(character-encoding) - I've translated this blog post, ["3 Steps to Fix Encoding Problems in Ruby"], into R as the first step to developing a lesson. ### Character vectors that live in a data frame -* Certain operations are facilitated by [tidyr]. These are described below. +* Certain operations are facilitated by tidyr. These are described below. * For a general discussion of how to work on variables that live in a data frame, see [Vectors versus tibbles](#oldies) (Appendix \@ref(oldies)). ## Load the tidyverse, which includes stringr @@ -145,7 +145,7 @@ head(fruit) %>% str_sub(1, 3) ``` -The `start` and `end` arguments are vectorised. __Example:__ a sliding 3-character window. +The `start` and `end` arguments are vectorised. Example: a sliding 3-character window. ```{r} tibble(fruit) %>% @@ -163,7 +163,7 @@ x ### Collapse a vector -You can collapse a character vector of length `n > 1` to a single string with `str_c()`, which also has other uses (see the [next section](#catenate-vectors)). +You can collapse a character vector of length `n > 1` to a single string with `str_c()`, which also has other uses (see the [following section](#catenate-vectors)). ```{r} @@ -230,7 +230,7 @@ knitr::include_graphics("img/regexbytrialanderror-big-smaller.png") ### Load gapminder -The country names in the `gapminder` dataset are convenient for examples. Load it now and store the `r nlevels(gapminder::gapminder$country)` unique country names to the object `countries`. +The country names in the `gapminder` data frame are convenient for examples. Load it now and store the `r nlevels(gapminder::gapminder$country)` unique country names to the object `countries`. ```{r} library(gapminder) @@ -239,7 +239,7 @@ countries <- levels(gapminder$country) ### Characters with special meaning -Frequently your string tasks cannot be expressed in terms of a fixed string, but can be described in terms of a **pattern**. Regular expressions, aka "regexes", are the standard way to specify these patterns. In regexes, specific characters and constructs take on special meaning in order to match multiple strings. +Frequently your string tasks cannot be expressed in terms of a fixed string, but can be described in terms of a **pattern**. Regular expressions, a.k.a "regexes", are the standard way to specify these patterns. In regexes, specific characters and constructs take on special meaning in order to match multiple strings. The first metacharacter is the period `.`, which stands for any single character, except a newline (which by the way, is represented by `\n`). The regex `a.b` will match all countries that have an `a`, followed by any single character, followed by `b`. Yes, regexes are case sensitive, i.e. "Italy" does not match. @@ -364,7 +364,7 @@ Here is routine, non-regex use of backslash `\` escapes in plain vanilla R strin Examples of using escapes in regexes to match characters that would otherwise have a special interpretation. -We know several `gapminder` country names contain a period. How do we isolate them? Although it's tempting, this command `str_subset(countries, pattern = ".")` won't work! +We know several `gapminder` country names contain a period. How do we isolate them? Although it's tempting, the command `str_subset(countries, pattern = ".")` won't work! ```{r} ## cheating using a POSIX class ;) diff --git a/12_character-encoding.Rmd b/12_character-encoding.Rmd index 50861e2..04d8b2d 100644 --- a/12_character-encoding.Rmd +++ b/12_character-encoding.Rmd @@ -23,7 +23,7 @@ source("common.R") For now, this page walks through these two mini-tutorials (written for Ruby), but translated to R: -* ["Guide to fixing encoding problems in Ruby"] +* ["3 Steps to Fix Encoding Problems in Ruby"] * ["How to Get From They’re to They’re"](https://www.justinweiss.com/articles/how-to-get-from-theyre-to-theyre/) Don't expect much creativity from me here. My goal is faithful translation. @@ -130,7 +130,7 @@ irb(main):078:0> "hi\x99!".encoding => # ``` -Ruby's guess is bad. This is not encoded as UTF-8. R admits it doesn't know and `stringi`'s guess is not good. +Ruby's guess is bad. This is not encoded as UTF-8. R admits it doesn't know and stringi's guess is not good. ```{r} string <- "hi\x99!" diff --git a/13_date-times.Rmd b/13_date-times.Rmd index 0c59ab3..b3af651 100644 --- a/13_date-times.Rmd +++ b/13_date-times.Rmd @@ -25,7 +25,7 @@ I start with this because we cannot possibly do this topic justice in a short am * The [lubridate] package. + On [CRAN](https://cloud.R-project.org/package=lubridate). + On [GitHub](https://github.com/tidyverse/lubridate). - + Main vignette: [Do more with dates and times in R]). + + Main vignette: [Do more with dates and times in R]. * Grolemund and Wickham's paper on lubridate in the Journal of Statistical Software: ["Dates and Times Made Easy with lubridate"] [-@grolemund2011]. * Exercises to push you to learn lubridate (*posts include links to answers!*) + [Part 1](https://www.r-exercises.com/2016/08/15/dates-and-times-simple-and-easy-with-lubridate-part-1/) diff --git a/14_multiple-tibbles.Rmd b/14_multiple-tibbles.Rmd index 784b3df..a2e7e9a 100644 --- a/14_multiple-tibbles.Rmd +++ b/14_multiple-tibbles.Rmd @@ -12,7 +12,7 @@ We've covered many topics on how to manipulate and reshape a single data frame: * Chapter \@ref(basic-data-care) - Basic care and feeding of data in R + Data frames (and tibbles) are awesome. -* Chapter \@ref(dplyr-intro) - Introduction to [dplyr] +* Chapter \@ref(dplyr-intro) - Introduction to dplyr + Filter, select, the pipe. * Chapter \@ref(dplyr-single) - dplyr functions for a single dataset + Single table verbs. @@ -30,7 +30,7 @@ But what if your data arrives in many pieces? There are many good (and bad) reas __Bind__ - This is basically smashing ~~rocks~~ tibbles together. You can smash things together row-wise ("row binding") or column-wise ("column binding"). Why do I characterize this as rock-smashing? They're often fairly crude operations, with lots of responsibility falling on the analyst for making sure that the whole enterprise even makes sense. -When row binding, you need to consider the variables in the two tibbles. Do the same variables exist in each? Are they of the same type? Different approaches for row binding have different combinations of flexibility vs rigidity around these matters. +When row binding, you need to consider the variables in the two tibbles. Do the same variables exist in each? Are they of the same type? Different approaches for row binding have different combinations of flexibility vs. rigidity around these matters. When column binding, the onus is entirely on the analyst to make sure that the rows are aligned. I would avoid column binding whenever possible. If you can introduce new variables through any other, safer means, do so! By safer, I mean: use a mechanism where the row alignment is correct **by definition**. A proper join is the gold standard. In addition to joins, functions like `dplyr::mutate()` and `tidyr::separate()` can be very useful for forcing yourself to work inside the constraint of a tibble. diff --git a/15_join-tibbles.Rmd b/15_join-tibbles.Rmd index 99954d2..65f13fb 100644 --- a/15_join-tibbles.Rmd +++ b/15_join-tibbles.Rmd @@ -6,7 +6,7 @@ source("common.R") -Join (a.k.a. merge) two tables: dplyr join cheatsheet with comic characters and publishers. +Join (a.k.a. merge) two tables: [dplyr] join cheatsheet with comic characters and publishers. ```{r gt-table-making-functions, include = FALSE} library(gt) diff --git a/17_r-objects-indexing.Rmd b/17_r-objects-indexing.Rmd index 4c65973..2ea01cb 100644 --- a/17_r-objects-indexing.Rmd +++ b/17_r-objects-indexing.Rmd @@ -167,7 +167,7 @@ names(a) Indexing a list is similar to indexing a vector but it is necessarily more complex. The fundamental issue is this: if you request a single element from the list, do you want a list of length 1 containing only that element or do you want the element itself? For the former (desired return value is a list), we use single square brackets, `[` and `]`, just like indexing a vector. For the latter (desired return value is a single element), we use a dollar sign `$`, which you've already used to get one variable from a data.frame, or double square brackets, `[[` and `]]`. -The ["pepper shaker photos" in R for Data Science](https://r4ds.had.co.nz/vectors.html#lists-of-condiments) are a splendid visual explanation of the different ways to get stuff out of a list. Highly recommended. +The ["pepper shaker photos"](https://r4ds.had.co.nz/vectors.html#lists-of-condiments) in [R for Data Science] [@wickham2016] are a splendid visual explanation of the different ways to get stuff out of a list. Highly recommended. > Warning: the rest of this section might make your eyes glaze over. Skip to the next section if you need to; come back later when some list is ruining your day. @@ -228,7 +228,7 @@ mode(jDat) class(jDat) ``` -> Sidebar: What is `I()`, used when creating the variable $y$ in the above data.frame? Short version: it tells R to do something _quite literally_. Here we are protecting the letters from being coerced to factor. We are ensuring we get a character vector. Note we let character-to-factor conversion happen in creating the $v$ variable above. More about (foiling) R's determination to convert character data to factor can be found [here](#factors-boss). +> Sidebar: What is `I()`, used when creating the variable `y` in the above data.frame? Short version: it tells R to do something _quite literally_. Here we are protecting the letters from being coerced to factor. We are ensuring we get a character vector. Note we let character-to-factor conversion happen in creating the `v` variable above. More about (foiling) R's determination to convert character data to factor can be found [here](#factors-boss). data.frames really are lists! Double square brackets can be used to get individual variables. Single square brackets can be used to get one or more variables, returned as a data.frame (though `subset(..., select = ...))` is how I would more likely do in a data analysis). diff --git a/18_functions-part1.Rmd b/18_functions-part1.Rmd index e16c159..c01bdb1 100644 --- a/18_functions-part1.Rmd +++ b/18_functions-part1.Rmd @@ -51,7 +51,7 @@ Internalize this "answer" because our informal testing relies on you noticing de This image [widely attributed to the Spotify development team](https://blog.fastmonkeys.com/?utm_content=bufferc2d6e&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer) conveys an important point. -```{r spotify-howtobuildmvp, echo = FALSE, out.width = "60%", fig.cap = "From [Your ultimate guide to Minimum Viable Product (+great examples)](https://blog.fastmonkeys.com/2014/06/18/minimum-viable-product-your-ultimate-guide-to-mvp-great-examples/)"} +```{r spotify-howtobuildmvp, echo = FALSE, out.width = "60%", fig.cap = 'From ["Your ultimate guide to Minimum Viable Product (+great examples)"](https://blog.fastmonkeys.com/2014/06/18/minimum-viable-product-your-ultimate-guide-to-mvp-great-examples/)'} knitr::include_graphics("img/spotify-howtobuildmvp.jpg") ``` @@ -100,7 +100,7 @@ Either check these results "by hand" or apply the "does that even make sense?" t ### Test on weird stuff -Now we try to break our function. Don't get truly diabolical (yet). Just make the kind of mistakes you can imagine making at 2am when, 3 years from now, you rediscover this useful function you wrote. Give your function inputs it's not expecting. +Now we try to break our function. Don't get truly diabolical (yet). Just make the kind of mistakes you can imagine making at 2 a.m. when, 3 years from now, you rediscover this useful function you wrote. Give your function inputs it's not expecting. ```{r error = TRUE} max_minus_min(gapminder) ## hey sometimes things "just work" on data.frames! @@ -112,7 +112,7 @@ How happy are you with those error messages? You must imagine that some entire _ ### I will scare you now -Here are some great examples STAT545 students devised during class where the function __should break but it does not.__ +Here are some great examples STAT 545 students devised during class where the function __should break but it does not.__ ```{r} max_minus_min(gapminder[c('lifeExp', 'gdpPercap', 'pop')]) @@ -163,7 +163,7 @@ mmm2(gapminder) In addition to a gratuitous apology, the error raised also contains two more pieces of helpful info: * *Which* function threw the error. -* Hints on how to fix things: expected class of input vs actual class. +* Hints on how to fix things: expected class of input vs. actual class. If it is easy to do so, I highly recommend this template: "you gave me THIS, but I need THAT". @@ -191,7 +191,7 @@ Where to next? In [part 2](#functions-part2) we generalize this function to take ## Resources -* Packages for runtime assertions (the last 3 seem to be under more active development than assertthat): +* Packages for runtime assertions: + assertthat on [CRAN](https://cloud.R-project.org/package=assertthat) and [GitHub](https://github.com/hadley/assertthat) - *the Hadleyverse option* + ensurer on [CRAN](https://cloud.R-project.org/package=ensurer) and [GitHub](https://github.com/smbache/ensurer) - *general purpose, pipe-friendly* + assertr on [CRAN](https://cloud.R-project.org/package=assertr) and [GitHub](https://github.com/ropensci/assertr) - *explicitly data pipeline oriented* diff --git a/20_functions-part3.Rmd b/20_functions-part3.Rmd index 4d882e1..a1923d5 100644 --- a/20_functions-part3.Rmd +++ b/20_functions-part3.Rmd @@ -162,7 +162,7 @@ defaulting to NULL then checking is.null() and take it from there ## Resources -* Hadley Wickham's book, [Advanced R] [-@wickham2015a] +* Hadley Wickham's book, [Advanced R] [-@wickham2015a]: + Section on [function arguments][adv-r-fxn-args] * Unit testing with the [testthat] package + On [CRAN](https://cloud.R-project.org/package=testthat); development on [GitHub](https://github.com/r-lib/testthat) diff --git a/21_functions-practicum.Rmd b/21_functions-practicum.Rmd index 88ce3cb..c72fe1d 100644 --- a/21_functions-practicum.Rmd +++ b/21_functions-practicum.Rmd @@ -12,8 +12,8 @@ We recently learned how to write our own R functions ([part 1](#functions-part1) Now we use that knowledge to write another useful function, within the context of the Gapminder data: -* Input: a data.frame that contains (at least) a life expectancy variable `lifeExp` and a variable for year `year` -* Output: a vector of estimated intercept and slope, from a linear regression of `lifeExp` on `year` +* __Input:__ a data.frame that contains (at least) a life expectancy variable `lifeExp` and a variable for year `year` +* __Output:__ a vector of estimated intercept and slope, from a linear regression of `lifeExp` on `year` The ultimate goal is to apply this function to the Gapminder data for a specific country. We will eventually scale up to *all* countries using external machinery, e.g., the `dplyr::group_by()` + `dplyr::do()`. diff --git a/24_effective-graphs.Rmd b/24_effective-graphs.Rmd index 04ba9a5..d077e1e 100644 --- a/24_effective-graphs.Rmd +++ b/24_effective-graphs.Rmd @@ -14,7 +14,7 @@ source("common.R") According to [Naomi Robbins](http://www.nbr-graphs.com), effective graphs "improve understanding of data". They do not confuse or mislead. -To paraphrase: Most of us use a computer to write but we would never characterize a Nobel prize winning writer as being highly skilled with Microsoft Word. Similarly, advanced ggplot2 skills won't necessarily lead to effective communication of numerical data. You have to master the __principles of effective graphs__ in addition to the mechanics. +To paraphrase: Most of us use a computer to write but we would never characterize a Nobel prize winning writer as being highly skilled with Microsoft Word. Similarly, advanced [ggplot2] skills won't necessarily lead to effective communication of numerical data. You have to master the __principles of effective graphs__ in addition to the mechanics. > One graph is more effective than another if its quantitative information can be decoded more quickly or more easily by most observers. @@ -44,7 +44,7 @@ We are best able to make comparisons via position of objects along a common scal Tufte, as quoted by Robbins: "the only worse design than a pie chart is several of them." -* "Problem 2: pie charts are worse at showing trends" from [Three reasons that pie charts suck](https://www.richardhollins.com/blog/why-pie-charts-suck/) shows a series of 3 pie charts versus a line chart. +* "Problem 2: pie charts are worse at showing trends" from ["Three reasons that pie charts suck"](https://www.richardhollins.com/blog/why-pie-charts-suck/) shows a series of 3 pie charts versus a line chart. * Rob Hyndman nominated a 3 pie chart series as [the worst figure](https://robjhyndman.com/hyndsight/worst-figure/), which has the added horror of cross-hatching. Sorry, no before and after here. ### Stacked and group bar charts @@ -77,11 +77,11 @@ We will look through this section (slides 1 - 36) of Karl Broman's excellent tal This animation created by Darkhorse Analytics illustrates how communication can be greatly enhanced by eliminating clutter and de-emphasizing supporting elements. Every aspect of a figure should be there on a "need to have it" basis. -```{r echo = FALSE, fig.cap = "From [Data Looks Better Naked](https://www.darkhorseanalytics.com/blog/data-looks-better-naked) by Darkhorse Analytics"} +```{r echo = FALSE, fig.cap = 'From ["Data Looks Better Naked"](https://www.darkhorseanalytics.com/blog/data-looks-better-naked) by Darkhorse Analytics'} knitr::include_graphics("img/less-is-more-darkhorse-analytics.gif") ``` -In CMEG, Figs 6.2 vs 6.3 make much the same point, i.e. stripping the figure way down is a huge improvement. Figs 5.4 and 5.5 are both decent graphs but using dots (Fig 5.5) instead of bars (Fig 5.4) improves the data\:ink ratio. +In CMEG, Figs 6.2 vs. 6.3 make much the same point, i.e. stripping the figure way down is a huge improvement. Figs 5.4 and 5.5 are both decent graphs but using dots (Fig 5.5) instead of bars (Fig 5.4) improves the data\:ink ratio. ## Do: spare your reader from mental gymnastics @@ -165,7 +165,7 @@ We will look through another section (slides 48 - 62) of Karl Broman's excellent * [Creating More Effective Graphs] by Naomi Robbins [-@robbins2012]. * The [R Graph Catalog] presents the figures from [Creating More Effective Graphs] as a visual quilt. Click on a figure to see the ggplot2 code that makes it. * Karl Broman's talk, "How to display data badly" - + Home on GitHub: https://github.com/kbroman/Talk_Graphs + + Home on [GitHub](https://github.com/kbroman/Talk_Graphs) + The version I showed is the [combined PDF from the iowastate2013 branch](https://www.biostat.wisc.edu/~kbroman/presentations/IowaState2013/graphs_combined.pdf) * The [ggplot2] package, written by Hadley Wickham. * [R Graphics Cookbook] [-@chang2013] by Winston Chang and the [Graphs section](http://www.cookbook-r.com/Graphs/) of his [Cookbook for R]. diff --git a/25_colors.Rmd b/25_colors.Rmd index d954a71..519d8bb 100644 --- a/25_colors.Rmd +++ b/25_colors.Rmd @@ -38,7 +38,7 @@ When you change a graphical parameter via `par()`, the original values are retur Big picture, it is best practice to restore the original, default state of hidden things that affect an R session. This is polite if you plan to inflict your code on others. Even if you live on an R desert island, this practice will prevent you from creating maddening little puzzles for yourself to solve in the middle of the night before a deadline. -Because of the way figures are handled by knitr, it is more complicated to change the default plotting symbol throughout an R Markdown document. To see how I've done it, check out a hidden chunk around here in the [source of this page](https://github.com/rstudio-education/stat545/blob/master/25_colors.Rmd). +Because of the way figures are handled by knitr, it is more complicated to change the default plotting symbol throughout an R Markdown document. To see how I've done it, check out a hidden chunk around here in the [source of this page]. ```{r include = FALSE} @@ -179,7 +179,7 @@ In 2015 Stéfan van der Walt and Nathaniel Smith designed new color maps for mat > These color maps are designed in such a way that they will analytically be perfectly perceptually-uniform, both in regular form and also when converted to black-and-white. They are also designed to be perceived by readers with the most common form of color blindness (all color maps in this package) and color vision deficiency ('cividis' only). -I encourage you to install viridis and read [the vignette](https://cloud.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html). It is easy to use these palettes in ggplot2 via `scale_color_viridis()` and `scale_fill_viridis()`. Taking control of color palettes in ggplot2 is covered elsewhere (see Chapter \@ref(qualitative-colors). +I encourage you to install viridis and read [the vignette](https://cloud.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html). It is easy to use these palettes in ggplot2 via `scale_color_viridis()` and `scale_fill_viridis()`. Taking control of color palettes in ggplot2 is covered elsewhere (see Chapter \@ref(qualitative-colors)). Here are two examples that show the viridis palettes: @@ -222,7 +222,8 @@ foo %>% tab_options(column_labels.font.weight = "bold") ``` -__Example:__ the first color in the palette is specified as "#1B9E77", so the intensity in the green channel is 9E. What does that mean? +Example: the first color in the palette is specified as "#1B9E77", so the intensity in the green channel is 9E. What does that mean? + $$ 9E = 9 * 16^1 + 14 * 16^0 = 9 * 16 + 14 = 158 $$ @@ -473,7 +474,7 @@ par(opar) [colorspace]: https://cloud.r-project.org/web/packages/colorspace/index.html [dichromat]: https://cloud.R-project.org/package=dichromat - +[source of this page]: https://github.com/rstudio-education/stat545/blob/master/25_colors.Rmd ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/26_qualitative-colors.Rmd b/26_qualitative-colors.Rmd index fe08deb..58395d2 100644 --- a/26_qualitative-colors.Rmd +++ b/26_qualitative-colors.Rmd @@ -8,7 +8,7 @@ source("common.R") ## Load packages and prepare the Gapminder data -Load the ggplot2 and dplyr packages and bring in the usual Gapminder data but drop Oceania, which only has two countries. +Load the [ggplot2] and [dplyr] packages and bring in the usual Gapminder data but drop Oceania, which only has two countries. We also sort the country factor based on population and then sort the data as well. Why? In the bubble plots below, we don't want large countries to hide small countries. This is a case where, sadly, the row order of the data truly affects the visual output. @@ -84,7 +84,7 @@ head(country_colors) `country_colors` is a named character vector, with one element per country, holding the RGB hex strings encoding the color scheme. -__Note:__ The order of `country_colors` is not alphabetical. The countries are actually sorted by size (in which particular year, I don't recall) within continent, reflecting the logic by which the scheme was created. No problem. Ideally, nothing in your analysis should depend on row order, although that's not always possible in reality. +Note: The order of `country_colors` is not alphabetical. The countries are actually sorted by size (in which particular year, I don't recall) within continent, reflecting the logic by which the scheme was created. No problem. Ideally, nothing in your analysis should depend on row order, although that's not always possible in reality. ## Prepare the color scheme for use with ggplot2 diff --git a/27_secrets-happy-graphics.Rmd b/27_secrets-happy-graphics.Rmd index c45c59a..180588b 100644 --- a/27_secrets-happy-graphics.Rmd +++ b/27_secrets-happy-graphics.Rmd @@ -108,13 +108,13 @@ gapminder %$% This is an entire topic covered elsewhere: -Chapter \@ref(tidy-data) - [Tidy data using Lord of the Rings] +Chapter \@ref(tidy-data) - Tidy data using Lord of the Rings ## Factor management This is an entire topic covered elsewhere: -Chapter \@ref(factors-boss) - [Be the boss of your factors](#factors-boss) +Chapter \@ref(factors-boss) - Be the boss of your factors ## Worked example diff --git a/28_saving-figures.Rmd b/28_saving-figures.Rmd index c6be076..f31017e 100644 --- a/28_saving-figures.Rmd +++ b/28_saving-figures.Rmd @@ -50,12 +50,12 @@ FWIW most of my figures exist as `pdf()`, `png()`, or both. Although it is not t Here are two good posts from the [Revolutions Analytics blog](https://blog.revolutionanalytics.com) with tips for saving figures to file: -* [10 tips for making your R graphics look their best](https://blog.revolutionanalytics.com/2009/01/10-tips-for-making-your-r-graphics-look-their-best.html) -* [High-quality R graphics on the Web with SVG](https://blog.revolutionanalytics.com/2011/07/r-svg-graphics.html) +* ["10 tips for making your R graphics look their best"](https://blog.revolutionanalytics.com/2009/01/10-tips-for-making-your-r-graphics-look-their-best.html) +* ["High-quality R graphics on the Web with SVG"](https://blog.revolutionanalytics.com/2011/07/r-svg-graphics.html) ## Write figures to file with `ggsave()` -If you are using ggplot2, write figures to file with [`ggsave()`][rdocs-ggsave]. +If you are using [ggplot2], write figures to file with [`ggsave()`][rdocs-ggsave]. If you are staring at a plot you just made on your screen, you can call `ggsave()`, specifying only a filename: @@ -67,7 +67,7 @@ It makes a sensible decision about everything else. In particular, as long as yo ### Passing a plot object to `ggsave()` -After the filename, the most common argument you will provide is `plot =`, which is the second argument by position. If you've been building up a plot with the typical ggplot2 workflow, you will pass the resulting object to `ggsave()`. __Example:__ +After the filename, the most common argument you will provide is `plot =`, which is the second argument by position. If you've been building up a plot with the typical ggplot2 workflow, you will pass the resulting object to `ggsave()`. Example: ```{r eval = FALSE} p <- ggplot(gapminder, aes(x = year, y = lifeExp)) + geom_jitter() @@ -227,9 +227,9 @@ It is worth noting here that the `ggsave()` workflow is not vulnerable to this g Some relevant threads on stackoverflow: -* [Using png not working when called within a function](https://stackoverflow.com/questions/9206110/using-png-function-not-working-when-called-within-a-function) -* [ggplot's qplot does not execute on sourcing](https://stackoverflow.com/questions/6675066/ggplots-qplot-does-not-execute-on-sourcing) -* [Save ggplot within a function](https://stackoverflow.com/questions/7034647/save-ggplot-within-a-function) +* ["Using png not working when called within a function"](https://stackoverflow.com/questions/9206110/using-png-function-not-working-when-called-within-a-function) +* ["ggplot's qplot does not execute on sourcing"](https://stackoverflow.com/questions/6675066/ggplots-qplot-does-not-execute-on-sourcing) +* ["Save ggplot within a function"](https://stackoverflow.com/questions/7034647/save-ggplot-within-a-function) ### Mysterious empty `Rplots.pdf` file @@ -241,7 +241,7 @@ I don't know of a reliable way to suppress this behavior uniformly and I just pe Some relevant threads on stackoverflow: -* [How to stop R from creating empty Rplots.pdf file when using ggsave and Rscript](https://stackoverflow.com/questions/17348359/how-to-stop-r-from-creating-empty-rplots-pdf-file-when-using-ggsave-and-rscript) +* ["How to stop R from creating empty Rplots.pdf file when using ggsave and Rscript"](https://stackoverflow.com/questions/17348359/how-to-stop-r-from-creating-empty-rplots-pdf-file-when-using-ggsave-and-rscript) ## Chunk name determines figure file name @@ -253,7 +253,7 @@ p <- ggplot(gapminder, aes(x = year, y = lifeExp)) + geom_jitter() p ``` -__Example:__ here's an R chunk called `scatterplot-lifeExp-vs-year`: +Example: here's an R chunk called `scatterplot-lifeExp-vs-year`: ```` `r ''````{r scatterplot-lifeExp-vs-year} diff --git a/29_multiple-plots.Rmd b/29_multiple-plots.Rmd index 86b4a4b..20a813d 100644 --- a/29_multiple-plots.Rmd +++ b/29_multiple-plots.Rmd @@ -14,7 +14,7 @@ Faceting is useful for constructing an array of similar plots where each panel c ## Meet the gridExtra package -Under the hood, ggplot2 uses the grid package to create figures. The [gridExtra] package provides some extra goodies and we will draw on them to place multiple ggplot2 plots on a single virtual page. +Under the hood, [ggplot2] uses the grid package to create figures. The [gridExtra] package provides some extra goodies and we will draw on them to place multiple ggplot2 plots on a single virtual page. You may need to install gridExtra and you will certainly need to load it. diff --git a/30_package-overview.Rmd b/30_package-overview.Rmd index 3a85529..7b2088a 100644 --- a/30_package-overview.Rmd +++ b/30_package-overview.Rmd @@ -21,7 +21,7 @@ source("common.R") ## Resources {-} * [R Packages] book: the second edition is under development by Hadley Wickham and Jennifer Bryan [-@wickham-unpub]. -* ["Writing R Extensions"], the One True Official Document on creating R packages. +* ["Writing R Extensions"] - the One True Official Document on creating R packages. ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/32_system-prep-packages.Rmd b/32_system-prep-packages.Rmd index 088b9b6..6ae5935 100644 --- a/32_system-prep-packages.Rmd +++ b/32_system-prep-packages.Rmd @@ -39,7 +39,7 @@ Go here and do what it says: During the installation of Rtools you will get to a window asking you to "Select Additional Tasks". **It is important that you make sure to select the box for "Edit the system PATH"**. -*Are we going to recommend making sure Git Bash is NOT on `PATH`? See [#230](https://github.com/STAT545-UBC/Discussion/issues/230#issuecomment-155236031).* +*Are we going to recommend making sure Git Bash is NOT on `PATH`? See [issue #230](https://github.com/STAT545-UBC/Discussion/issues/230#issuecomment-155236031).* ```{r echo = FALSE, fig.cap = "Rtools installation", out.width = "65%"} knitr::include_graphics("img/rtools-install.png") diff --git a/33_create-package.Rmd b/33_create-package.Rmd index 8bf80ed..0b8533a 100644 --- a/33_create-package.Rmd +++ b/33_create-package.Rmd @@ -7,3 +7,6 @@ source("common.R") *The content that originally lived here now appears as the [The Whole Game](https://r-pkgs.org/whole-game.html) chapter in the under-development 2nd edition of the [R Packages] book [@wickham-unpub].* + +```{r links, child="links.md"} +``` \ No newline at end of file diff --git a/34_workflows.Rmd b/34_workflows.Rmd index ba4c9b8..4d9aace 100644 --- a/34_workflows.Rmd +++ b/34_workflows.Rmd @@ -21,7 +21,7 @@ Although we spend a lot of time working with data interactively, this sort of ha + This fully developed example shows you: * How to run an R script non-interactively * How to use `make`... - - To record which files are inputs vs intermediates vs outputs + - To record which files are inputs vs. intermediates vs. outputs - To capture how scripts and commands convert inputs to outputs - To re-run parts of an analysis that are out-of-date * The intersection of R and `make`, i.e. how to... @@ -40,9 +40,9 @@ Although we spend a lot of time working with data interactively, this sort of ha * [xkcd comic on automation](https://xkcd.com/1319/). 'Automating' comes from the roots 'auto-' meaning 'self-', and 'mating', meaning 'screwing'. * Karl Broman covers [GNU Make](https://www.gnu.org/software/make/) in his course ["Tools for Reproducible Research"](https://kbroman.org/Tools4RR/pages/schedule.html). -* Karl Broman also wrote ["minimal make: a minimal tutorial on make"](https://kbroman.org/minimal_make/), aimed at stats / data science types. +* Karl Broman also wrote ["minimal make: a minimal tutorial on make"], aimed at stats / data science types. * ["Using Make for reproducible scientific analyses"](https://web.archive.org/web/20160306042959/http://www.bendmorris.com/2013/09/using-make-for-reproducible-scientific.html), blog post by Ben Morris. -* Software Carpentry's [Slides on `Make`](https://web.archive.org/web/20150110211213/http://software-carpentry.org/v4/make/index.html). +* Software Carpentry's [slides on `Make`](https://web.archive.org/web/20150110211213/http://software-carpentry.org/v4/make/index.html). * Zachary M. Jones wrote ["GNU Make for Reproducible Data Analysis"](http://zmjones.com/make/). * ["Keeping tabs on your data analysis workflow"](https://adamlaiacano.tumblr.com/post/45356689519/keeping-tabs-on-your-data-analysis-workflow), blog post by Adam Laiacano. * Mike Bostock, of D3.js and New York Times fame, explains ["Why Use Make"](https://bost.ocks.org/mike/make/) -- "it's about the benefits of capturing workflows via a file-based dependency-tracking build system". @@ -77,11 +77,11 @@ Download and [install msysGit](https://github.com/msysgit/msysgit/releases/downl Here is another alternative for installing `make` alone: -* Go to the [Make for Windows](http://gnuwin32.sourceforge.net/packages/make.htm) web site. -* Download the [Setup program](http://gnuwin32.sourceforge.net/downlinks/make.php). -* Install the file you just downloaded and copy to your clipboard the directory in which it is being installed. - - FYI: The default directory is `C:\Program Files (x86)\GnuWin32\` -* You now have `make` installed, but you need to tell Windows where to find the program. This is called [updating your `PATH`](https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=windows%20update%20path%20variable). You will want to update the `PATH` to include the `bin` directory of the newly installed program. +1. Go to the [Make for Windows](http://gnuwin32.sourceforge.net/packages/make.htm) web site. +1. Download the [Setup program](http://gnuwin32.sourceforge.net/downlinks/make.php). +1. Install the file you just downloaded and copy to your clipboard the directory in which it is being installed. + - FYI: The default directory is `C:\Program Files (x86)\GnuWin32\` +1. You now have `make` installed, but you need to tell Windows where to find the program. This is called [updating your `PATH`](https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=windows%20update%20path%20variable). You will want to update the `PATH` to include the `bin` directory of the newly installed program. ## Update your `PATH` @@ -89,29 +89,29 @@ If you installed Make for Windows (as opposed to the `make` that comes with Git These are the steps on Windows 7 (we don't have such a write-up yet for Windows 8 -- feel free to send one!): -* Click on the Windows logo. -* Right click on *Computer*. -* Select *Properties*. -* Select *Advanced System Settings*. -* Select *Environment variables*. -* Select the line that has the `PATH` variable. You may have to scroll down to find it. -* Select *Edit*. -* Go to the end of the line and add a semicolon `;`, followed by the path where the program was installed, followed by `\bin`. - - Typical example of what one might add: `;C:\Program Files (x86)\GnuWin32\bin` -* Click Okay and close all the windows that you opened. -* Quit RStudio and open it again. -* You should now be able to use `make` from RStudio and the command line. +1. Click on the Windows logo. +1. Right click on *Computer*. +1. Select *Properties*. +1. Select *Advanced System Settings*. +1. Select *Environment variables*. +1. Select the line that has the `PATH` variable. You may have to scroll down to find it. +1. Select *Edit*. +1. Go to the end of the line and add a semicolon `;`, followed by the path where the program was installed, followed by `\bin`. + - Typical example of what one might add: `;C:\Program Files (x86)\GnuWin32\bin` +1. Click Okay and close all the windows that you opened. +1. Quit RStudio and open it again. +1. You should now be able to use `make` from RStudio and the command line. ## Issues we are still clarifying -See [issue 58](https://github.com/STAT545-UBC/Discussion/issues/58) for what seems to be the most comprehensive statement of the Windows situation. +See [issue #58](https://github.com/STAT545-UBC/Discussion/issues/58) for what seems to be the most comprehensive statement of the Windows situation. What are the tricky bits? * Getting the same `Makefile` to "work" via RStudio's Build buttons/menus and in the [shell]. And, for that matter, which [shell]? Git Bash or ??? * Ensuring `make`, `Rscript`, `pandoc`, `rm`, etc. can be found = updating `PATH`. * Getting `make` to use the correct [shell]. - - See [issue 54](https://github.com/STAT545-UBC/Discussion/issues/54) on the Discussion repo. + - See [issue #54](https://github.com/STAT545-UBC/Discussion/issues/54) on the Discussion repo. # Automation: test drive `make` {#make-test-drive} @@ -125,9 +125,9 @@ Before we use `make` for real work, we want to prove beyond a shadow of a doubt You can delete this project after this test drive, so don't sweat too much about what you name it or where you put it. -* Create an RStudio project: *File > New Project* -* Create a new text file: *File > New File > Text File* -* We are about to write our first `Makefile`! +1. Create an RStudio project: *File > New Project* +1. Create a new text file: *File > New File > Text File* +1. We are about to write our first `Makefile`! But first ... @@ -177,7 +177,8 @@ If you see something like this: ```sh Makefile:2: *** missing separator. Stop. ``` -you probably have spaces instead of tabs as indentation. Fix that and try again. + +You probably have spaces instead of tabs as indentation. Fix that and try again. RStudio offers these buttons or menu items to run things from your `Makefile`: @@ -206,8 +207,8 @@ This proves that `make` is installed and working from RStudio. RStudio only provides access to a very limited bit of `make` -- it's even more limited than the RStudio Git client. In the long run, it's important to be able to run `make` from the [shell]. -* Select *Tools > Shell* -* Run +1. Select *Tools > Shell* +1. Run ```sh make clean @@ -289,12 +290,12 @@ words.txt: __Suggested workflow:__ -* *Git folks:* commit anything new/modified. Start with a clean working tree. -* Submit the above `download.file()` command in the R Console to make sure it works. -* Inspect the downloaded words file any way you know how; make sure it's not garbage. Size should be about 2.4MB. -* Delete `words.txt`. -* Put the above rule into your `Makefile`. From the [shell], enter `make words.txt` to verify rule works. Reinspect the words file. -* *Git folks:* commit `Makefile` and `words.txt`. +1. *Git folks:* commit anything new/modified. Start with a clean working tree. +1. Submit the above `download.file()` command in the R Console to make sure it works. +1. Inspect the downloaded words file any way you know how; make sure it's not garbage. Size should be about 2.4MB. +1. Delete `words.txt`. +1. Put the above rule into your `Makefile`. From the [shell], enter `make words.txt` to verify rule works. Reinspect the words file. +1. *Git folks:* commit `Makefile` and `words.txt`. See the sample project at this point in [this commit](https://github.com/STAT545-UBC/make-activity/tree/c30ecc9c890a2f2261eb94118997f0774012eeb8). @@ -316,13 +317,13 @@ words.txt: /usr/share/dict/words __Suggested workflow:__ -* *Git folks:* commit anything new/modified. Start with a clean working tree. -* Remove `words.txt` if you succeeded with the download approach. -* Submit the above `cp` command in the [shell] to make sure it works. -* Inspect the copied words file any way you know how; make sure it's not garbage. Size should be about 2.4MB. -* Delete `words.txt`. -* Put the above rule into your `Makefile`. From the [shell], enter `make words.txt` to verify rule works. Reinspect the words file. -* *Git folks:* look at the diff. You should see how your `words.txt` rule has changed and you might also see some differences between the local and remote words files. Interesting! Commit `Makefile` and `words.txt`. +1. *Git folks:* commit anything new/modified. Start with a clean working tree. +1. Remove `words.txt` if you succeeded with the download approach. +1. Submit the above `cp` command in the [shell] to make sure it works. +1. Inspect the copied words file any way you know how; make sure it's not garbage. Size should be about 2.4MB. +1. Delete `words.txt`. +1. Put the above rule into your `Makefile`. From the [shell], enter `make words.txt` to verify rule works. Reinspect the words file. +1. *Git folks:* look at the diff. You should see how your `words.txt` rule has changed and you might also see some differences between the local and remote words files. Interesting! Commit `Makefile` and `words.txt`. See the sample project at this point in [this commit](https://github.com/STAT545-UBC/make-activity/tree/1131791548e0c5bbc5104eebb19710ed435146e3). @@ -346,13 +347,13 @@ Since our only output so far is `words.txt`, that's what we associate with the ` __Suggested workflow:__ -* Use `make clean` from the shell and/or *RStudio > Build > More > Clean All* to delete `words.txt`. - - Does it go away? - - *Git folks:* does the deletion of this file show up in your Git tab? -* Use `make all` from the shell and/or *RStudio > Build > Build All* to get `words.txt` back. - - Does it come back? - - *Git folks:* does the restoration of `words.txt` cause it to drop off your radar as a changed/deleted file? See how this stuff all works together? -* *Git folks:* Commit. +1. Use `make clean` from the shell and/or *RStudio > Build > More > Clean All* to delete `words.txt`. + - Does it go away? + - *Git folks:* does the deletion of this file show up in your Git tab? +1. Use `make all` from the shell and/or *RStudio > Build > Build All* to get `words.txt` back. + - Does it come back? + - *Git folks:* does the restoration of `words.txt` cause it to drop off your radar as a changed/deleted file? See how this stuff all works together? +1. *Git folks:* Commit. See the sample project at this point in [this commit](https://github.com/STAT545-UBC/make-activity/tree/9e1a556adc602ffce91b5c8edccd223237080c54). diff --git a/39_appendix.Rmd b/39_appendix.Rmd index 07c22c8..0a297fb 100644 --- a/39_appendix.Rmd +++ b/39_appendix.Rmd @@ -514,6 +514,47 @@ Instructor dependencies: ## Link Reference Formatting +We have *a lot* of links in this bookdown. In an effort to keep things tidy, we have developed a method of organizing them. If you would like to contribute to this bookdown we ask that you please format any new links the same way. + + +### Added to `links.md` + +Any links that are used 2+ chapters (a.k.. `.Rmd` files) or any links that we *think* might be used in other chapters in the future are saved in `links.md` in the format `[ref-label]: link`. We try to have `ref-label` to be some text that we can usually live with as link text when possible. + +For example, + +> [Happy Git and GitHub for the useR]: https://happygitwithr.com + + +CHECK THIS +The reference label is not case-sensitive, + +### Added as a reference-style link + +BOOKDOWN DOCS LINK? + +If a link appears in only one chapter (a.k.a. `.Rmd`) & is used more than once, format it as a reference-style link. Again, trying to use a reference label that is some text that we can usually live with as link text. + +For example, + +> It's not really fair to complain about the lack of visible alignment. Remember we are ["writing data for computers"]. +> ... +> Huh? Don't worry about it. Remember we are ["writing data for computers"]. +> ... +> +> ["writing data for computers"]: https://twitter.com/vsbuffalo/statuses/358699162679787521 + + + + +generally, if first time package is being mentioned in a chapter, include main link. e.g. [gapminder] +### Added as an inline-style link + + + + + + If you anticipating the link being used again in another chapter, go ahead and put it in `links.md`. If it is a main package that you think will be used in multiple chapters, then include links in `links.md`. diff --git a/links.md b/links.md index 5ceff34..381c057 100644 --- a/links.md +++ b/links.md @@ -4,7 +4,7 @@ [ggplot2 tutorial]: https://github.com/jennybc/ggplot2-tutorial [R Graph Catalog]: https://github.com/jennybc/r-graph-catalog - + [dplyr]: https://dplyr.tidyverse.org [tidyr]: https://tidyr.tidyverse.org [ggplot2]: https://ggplot2.tidyverse.org @@ -27,20 +27,15 @@ [rvest]: https://rvest.tidyverse.org [Shiny]: https://shiny.rstudio.com [gh]: https://github.com/r-lib/gh - [plyr]: http://plyr.had.co.nz [magrittr]: https://magrittr.tidyverse.org [googlesheets]: https://github.com/jennybc/googlesheets [gapminder]: https://github.com/jennybc/gapminder - - [stringi]: http://www.gagolewski.com/software/stringi/ - [rex]: https://github.com/kevinushey/rex [lattice]: http://lattice.r-forge.r-project.org [RColorBrewer]: https://cloud.r-project.org/package=RColorBrewer [gridExtra]: https://cloud.r-project.org/package=gridExtra - [rebird]: https://docs.ropensci.org/rebird/ [geonames]: https://docs.ropensci.org/geonames/ [rplos]: https://docs.ropensci.org/rplos/ @@ -48,27 +43,21 @@ [genderdata]: https://docs.ropensci.org/genderdata/ [curl]: https://jeroen.cran.dev/curl [jsonlite]: https://github.com/jeroen/jsonlite - [shinythemes]: https://rstudio.github.io/shinythemes/ [shinyjs]: https://deanattali.com/shinyjs/ [leaflet]: https://rstudio.github.io/leaflet/ [ggvis-web]: https://ggvis.rstudio.com [shinydashboard]: https://rstudio.github.io/shinydashboard/ - - -[dplyr-cran]: https://cloud.r-project.org/package=dplyr -[dplyr-github]: https://github.com/hadley/dplyr - - - + [Introduction to dplyr]: https://dplyr.tidyverse.org/articles/dplyr.html [Window functions]: https://dplyr.tidyverse.org/articles/window-functions.html [Two-table verbs]: https://dplyr.tidyverse.org/articles/two-table.html [Do more with dates and times in R]: https://lubridate.tidyverse.org/articles/lubridate.html +[dplyr-cran]: https://cloud.r-project.org/package=dplyr +[dplyr-github]: https://github.com/hadley/dplyr - - + [Happy Git and GitHub for the useR]: https://happygitwithr.com [R for Data Science]: https://r4ds.had.co.nz [The tidyverse style guide]: https://style.tidyverse.org @@ -79,7 +68,7 @@ [Cookbook for R]: http://www.cookbook-r.com [ggplot2: Elegant Graphics for Data Analysis]: https://ggplot2-book.org/index.html - + [adv-r-fxn-args]: http://adv-r.had.co.nz/Functions.html#function-arguments [r4ds-transform]: https://r4ds.had.co.nz/transform.html [r4ds-readr-strings]: https://r4ds.had.co.nz/data-import.html#readr-strings @@ -89,7 +78,6 @@ [wiki-snake-case]: https://en.wikipedia.org/wiki/Snake_case [Janus]: https://en.wikipedia.org/wiki/Janus - [RStudio Data Transformation Cheat Sheet]: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf [Regular Expressions in R Cheat Sheet]: https://github.com/rstudio/cheatsheets/raw/master/regex.pdf @@ -105,18 +93,13 @@ ["A layered grammar of graphics"]: https://vita.had.co.nz/papers/layered-grammar.html [Managing Projects with GNU Make, 3rd Edition]: http://shop.oreilly.com/product/9780596006105.do - +["minimal make: a minimal tutorial on make"]: https://kbroman.org/minimal_make/ ["Let the Data Flow: Pipelines in R with dplyr and magrittr"]: https://github.com/tjmahr/MadR_Pipelines ["Hands-on dplyr tutorial for faster data manipulation in R"]: https://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/ ["Writing R Extensions"]: https://cloud.r-project.org/doc/manuals/r-release/R-exts.html ["The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"]: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ ["What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text"]: http://kunststube.net/encoding/ -["Guide to fixing encoding problems in Ruby"]: https://www.justinweiss.com/articles/3-steps-to-fix-encoding-problems-in-ruby/ +["3 Steps to Fix Encoding Problems in Ruby"]: https://www.justinweiss.com/articles/3-steps-to-fix-encoding-problems-in-ruby/ ["My favorite RGB color"]: https://manyworldstheory.com/2013/01/15/my-favorite-rgb-color/ ["Why Should Engineers and Scientists Be Worried About Color?"]: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=2ahUKEwi0xYqJ8JbjAhWNvp4KHViYDxsQFjABegQIABAC&url=https%3A%2F%2Fwww.researchgate.net%2Fprofile%2FAhmed_Elhattab2%2Fpost%2FPlease_suggest_some_good_3D_plot_tool_Software_for_surface_plot%2Fattachment%2F5c05ba35cfe4a7645506948e%2FAS%253A699894335557644%25401543879221725%2Fdownload%2FWhy%2BShould%2BEngineers%2Band%2BScientists%2BBe%2BWorried%2BAbout%2BColor_.pdf&usg=AOvVaw1qwjjGMd7h_z6TLUjzu7Nb - - - - - From 804ce7eb665baeaaa2a7c04a027fa0e9f19d2d41 Mon Sep 17 00:00:00 2001 From: Grace Lawley Date: Thu, 21 Nov 2019 01:08:13 -0800 Subject: [PATCH 3/3] add contributing section & small link edits --- 34_workflows.Rmd | 78 ++++++++++---------- 36_api-wrappers.Rmd | 11 +-- 37_diy-web-data.Rmd | 40 +++++----- 38_shiny.Rmd | 4 +- 39_appendix.Rmd | 173 ++++++++++++++++++++++++++++++++------------ links.md | 31 ++++---- 6 files changed, 208 insertions(+), 129 deletions(-) diff --git a/34_workflows.Rmd b/34_workflows.Rmd index 4d9aace..05f5086 100644 --- a/34_workflows.Rmd +++ b/34_workflows.Rmd @@ -228,11 +228,11 @@ If you are not, are you getting the error message that's characteristic of a "sp The goal of this activity is to create a pipeline that will... -* Obtain a large file of English words. -* Calculate a histogram of word lengths. -* Determine the most common word length. -* Generate a figure of this histogram. -* Render a R Markdown report in HTML and PDF. +1. Obtain a large file of English words. +1. Calculate a histogram of word lengths. +1. Determine the most common word length. +1. Generate a figure of this histogram. +1. Render a R Markdown report in HTML and PDF. You will automate this pipeline using `make`! @@ -372,11 +372,11 @@ Create the R script `histogram.r` that reads the list of words from `words.txt` __Suggested workflow:__ -* Develop your `histogram.r` script interactively. Make sure it works when you step through it line-by-line. Debugging only gets harder once you're running entire scripts at arm's length via `make`! -* Remove `histogram.tsv`. Clean out the workspace and restart R. Run `histogram.r` via `source()` or using RStudio's Source button. Make sure it works! -* Add the `histogram.tsv` rule to your `Makefile`. -* Remove `histogram.tsv` and regenerate it via `make histogram.tsv` from the [shell]. -* *Git folks:* Commit. +1. Develop your `histogram.r` script interactively. Make sure it works when you step through it line-by-line. Debugging only gets harder once you're running entire scripts at arm's length via `make`! +1. Remove `histogram.tsv`. Clean out the workspace and restart R. Run `histogram.r` via `source()` or using RStudio's Source button. Make sure it works! +1. Add the `histogram.tsv` rule to your `Makefile`. +1. Remove `histogram.tsv` and regenerate it via `make histogram.tsv` from the [shell]. +1. *Git folks:* Commit. See the sample project at this point in [this commit](https://github.com/STAT545-UBC/make-activity/tree/889e01a3d610e900c7e58ebd32a0506c61543fd9). @@ -393,13 +393,13 @@ clean: __Suggested workflow:__ -* Use `make clean` from the shell and/or *RStudio > Build > More > Clean All*. - - Do `words.txt` and `histogram.tsv` go away? - - *Git folks:* does the deletion of these files show up in your Git tab? -* Use `make all` from the shell and/or *RStudio > Build > Build All* to get `words.txt` back. - - Does it come back? - - *Git folks:* does the restoration of the files cause them to drop off your radar as changed/deleted files? -* *Git folks:* Commit. +1. Use `make clean` from the shell and/or *RStudio > Build > More > Clean All*. + - Do `words.txt` and `histogram.tsv` go away? + - *Git folks:* does the deletion of these files show up in your Git tab? +1. Use `make all` from the shell and/or *RStudio > Build > Build All* to get `words.txt` back. + - Does it come back? + - *Git folks:* does the restoration of the files cause them to drop off your radar as changed/deleted files? +1. *Git folks:* Commit. See the sample project at this point in [this commit](https://github.com/STAT545-UBC/make-activity/tree/4f392d0e20bb7e4bfcdc00a812190e40e27ae3d4). @@ -414,15 +414,15 @@ histogram.png: histogram.tsv __Suggested workflow:__ -* Test the histogram-drawing code in the R Console to make sure it works. -* Inspect the resulting PNG to make sure it's good. -* Clean up after yourself. -* Add the above rule to your `Makefile`. -* Test that new rule works. -* If you get an unexpected empty plot `Rplots.pdf`, don't worry about it yet. -* Update the `all` and `clean` targets in light of this addition to the pipeline. -* Test the new definitions of `all` and `clean`. -* *Git folks:* commit. +1. Test the histogram-drawing code in the R Console to make sure it works. +1. Inspect the resulting PNG to make sure it's good. +1. Clean up after yourself. +1. Add the above rule to your `Makefile`. +1. Test that new rule works. +1. If you get an unexpected empty plot `Rplots.pdf`, don't worry about it yet. +1. Update the `all` and `clean` targets in light of this addition to the pipeline. +1. Test the new definitions of `all` and `clean`. +1. *Git folks:* commit. *NOTE: Why are we writing this PNG to file when, by the end of the activity, we are writing an R Markdown report? We could include this figure-making code in an R chunk there. We're doing it this way to demonstrate more about R and `make` workflows. Plus sometimes we do work this way in real life, if a figure has a life outside one specific R Markdown report.* @@ -444,11 +444,11 @@ histogram.png: histogram.tsv __Suggested workflow:__ -* Remove `Rplots.pdf` manually -* Add the `rm Rplots.pdf` command to the `histogram.png` rule. -* Test that new rule works. -* Test that behavior of `all` and `clean` still good. -* *Git folks:* commit. +1. Remove `Rplots.pdf` manually +1. Add the `rm Rplots.pdf` command to the `histogram.png` rule. +1. Test that new rule works. +1. Test that behavior of `all` and `clean` still good. +1. *Git folks:* commit. See the sample project at this point in [this commit](https://github.com/STAT545-UBC/make-activity/tree/3b75dac0d0cd8dd7e7cd3c2e66799a65d90b9fff). @@ -472,14 +472,14 @@ Create the R Markdown file `report.rmd` that reads the table of word lengths `hi __Suggested workflow:__ -* Develop `report.rmd`, running the R chunks often, from a clean workspace and fresh R session. Debugging only gets harder once you're rendering entire reports at arm's length via `make`! -* Render the report using `rmarkdown::render()` in the Console or RStudio's Preview HTML button. -* Clean up after yourself. -* Add the above rule for `report.html` to your `Makefile`. -* Test that new rule works. -* Update the `all` and `clean` targets in light of this addition to the pipeline. -* Test the new definitions of `all` and `clean`. -* *Git folks:* commit. +1. Develop `report.rmd`, running the R chunks often, from a clean workspace and fresh R session. Debugging only gets harder once you're rendering entire reports at arm's length via `make`! +1. Render the report using `rmarkdown::render()` in the Console or RStudio's Preview HTML button. +1. Clean up after yourself. +1. Add the above rule for `report.html` to your `Makefile`. +1. Test that new rule works. +1. Update the `all` and `clean` targets in light of this addition to the pipeline. +1. Test the new definitions of `all` and `clean`. +1. *Git folks:* commit. See the sample project at this point in [this commit](https://github.com/STAT545-UBC/make-activity/tree/91ebcfc7d25743ebd8d6c9684ed7923ad4758585). diff --git a/36_api-wrappers.Rmd b/36_api-wrappers.Rmd index 268ff28..2b2e562 100644 --- a/36_api-wrappers.Rmd +++ b/36_api-wrappers.Rmd @@ -23,7 +23,7 @@ There are many ways to obtain data from the internet; let's consider four catego In the simplest case, the data you need is already on the internet in a tabular format. There are a couple of strategies here: * Use `read.csv()` or `readr::read_csv()` to read the data straight into R. -* Use the command line program curl to do that work, and place it in a `Makefile` or shell script (see the [section on make](#automation-overview) for more on this). +* Use the command line program curl to do that work, and place it in a `Makefile` or shell script (see the [chapters on make](#automation-overview) for more on this). The second case is most useful when the data you want has been provided in a format that needs cleanup. For example, the World Value Survey makes several datasets available as Excel sheets. The safest option here is to download the `.xls` file, then read it into R with `readxl::read_excel()` or something similar. An exception to this is data provided as Google Spreadsheets, which can be read straight into R using the [googlesheets] package. @@ -59,6 +59,8 @@ Why would we want this? ### Load the tidyverse +We will be using the functions from the [tidyverse] throughout this chapter, so go ahead and load the tidyverse package now. + ```{r message = FALSE, warning = FALSE} library(tidyverse) ``` @@ -157,7 +159,7 @@ library(usethis) edit_r_profile() ``` -This will open up your `.Rprofile` file. Add `options(geonamesUsername="my_user_name")` on a new line (replace "my_user_name" with your GeoNames username). +This will open up your `.Rprofile` file. Add `options(geonamesUsername="my_user_name")` on a new line, replacing `"my_user_name"` with your GeoNames username. **Important**: Make sure your `.Rprofile` ends with a blank line! @@ -246,8 +248,7 @@ searchplos("materials_and_methods:study site", fl = "title, materials_and_method searchplos("*:*", fl = "id") ``` -Here is a list of [options for the search](http://api.plos.org/solr/search-fields/) or you can run `data(plosfields)` followed by `plosfields` in the R Console. - +You can see a list of the available search fields [here](http://api.plos.org/solr/search-fields/), or you can run `data(plosfields)` followed by `plosfields` in the R Console. #### Take a highbrow look! @@ -271,7 +272,7 @@ We can use the `plot_throughtime()` function to visualize the results of a searc plot_throughtime(terms = "phylogeny", limit = 200) ``` -### Is it a boy or a girl? gender-associated names throughout US history +### Gender-associated names throughout US history: gender The [gender] package (on [CRAN](https://cloud.r-project.org/package=gender); on [GitHub](https://github.com/ropensci/gender)) allows you access to data on the gender of names in the US. Because names change gender over the years, the probability of a name belonging to a man or a woman also depends on the *year*. diff --git a/37_diy-web-data.Rmd b/37_diy-web-data.Rmd index d0ca639..7bcac59 100644 --- a/37_diy-web-data.Rmd +++ b/37_diy-web-data.Rmd @@ -19,7 +19,7 @@ In Chapter \@ref(api-wrappers) we experimented with several packages that "wrapp ### Load the tidyverse -We will be using the functions from the [tidyverse] throughout this chapter, so go ahead and load tidyverse package now. +We will be using the functions from the [tidyverse] throughout this chapter, so go ahead and load the tidyverse package now. ```{r message = FALSE, warning = FALSE} library(tidyverse) @@ -27,7 +27,7 @@ library(tidyverse) ### Examine the structure of API requests using the Open Movie Database -First we're going to examine the structure of API requests via the [Open Movie Database](http://www.omdbapi.com/) (OMDb). OMDb is very similar to IMDb, except it has a nice, simple API. We can go to the website, input some search parameters, and obtain both the XML query and the response from it. +First, we're going to examine the structure of API requests via the [Open Movie Database](http://www.omdbapi.com/) (OMDb). OMDb is very similar to IMDb, except it has a nice, simple API. We can go to the website, input some search parameters, and obtain both the XML query and the response from it. @@ -123,27 +123,27 @@ This tells us that we need an API key to access the OMDb API. We will store our Now we follow the rOpenSci [tutorial on API keys](https://github.com/ropensci/rOpenSci/wiki/Installation-and-use-of-API-keys): -* ___Add `.Rprofile` to your `.gitignore` !!___ -* Make a `.Rprofile` file ([windows tips](http://cran.r-project.org/bin/windows/rw-FAQ.html#What-are-HOME-and-working-directories_003f); [mac tips](http://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#The-R-Console)). -* Write the following in it: +1. __Add `.Rprofile` to your `.gitignore` !!__ +1. Make a `.Rprofile` file ([windows tips](http://cran.r-project.org/bin/windows/rw-FAQ.html#What-are-HOME-and-working-directories_003f); [mac tips](http://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#The-R-Console)). +1. Write the following in it: -```{r eval = FALSE} -options(OMBD_API_KEY = "YOUR_KEY") -``` + ```{r eval = FALSE} + options(OMBD_API_KEY = "YOUR_KEY") + ``` -* Restart R (i.e. reopen your RStudio project). +1. Restart R (i.e. reopen your RStudio project). This code adds another element to the list of options, which you can see by calling `options()`. Part of the work done by `rplos::searchplos()` and friends is to go and obtain the value of this option with the function `getOption("OMBD_API_KEY")`. This indicates two things: 1. Spelling is important when you set the option in your `.Rprofile` 2. You can do a similar process for an arbitrary package or key. For example: -```{r eval = FALSE} -## in .Rprofile -options("this_is_my_key" = XXXX) -## later, in the R script: -key <- getOption("this_is_my_key") -``` + ```{r eval = FALSE} + ## in .Rprofile + options("this_is_my_key" = XXXX) + ## later, in the R script: + key <- getOption("this_is_my_key") + ``` This is a simple means to keep your keys private, especially if you are sharing the same authentication across several projects. @@ -329,19 +329,19 @@ Not exactly the result we were hoping for! However, this does tell us about the * It has a `` node, which has a single child node, ``. * The information we want is all stored as attributes (e.g. title, year, etc.). -The xml2 package has various functions to assist in navigating through XML. We can use the `xml_children()` function to extract all of the children nodes (i.e. the single child, ``): +The xml2 package has various functions to assist in navigating through XML. We can use the `xml_children()` function to extract all of the children nodes (i.e. the single child, ``). ```{r} (contents <- xml_contents(xml_parsed)) ``` -The `xml_attrs()` function "retrieves all attribute values as a named character vector". Let's use this to extract the information that we want from the `` node: +The `xml_attrs()` function "retrieves all attribute values as a named character vector". Let's use this to extract the information that we want from the `` node. ```{r} (attrs <- xml_attrs(contents)[[1]]) ``` -We can transform this named character vector into a data frame with the help of `dplyr::bind_rows()`: +We can transform this named character vector into a data frame with the help of `dplyr::bind_rows()`. ```{r} attrs %>% @@ -351,7 +351,7 @@ attrs %>% ## Introducing the easy way: httr -The [httr] package is yet another star in the [tidyverse]. It is designed to facilitate all things HTTP from within R. This includes the major HTTP verbs, which are: +The [httr] package is yet another star in the tidyverse. It is designed to facilitate all things HTTP from within R. This includes the major HTTP verbs, which are: * __`GET()`__ - Fetch an existing resource. The URL contains all the necessary information the server needs to locate and return the resource. @@ -536,7 +536,7 @@ glimpse(research) ### Airports -First, go to this website about [Airports](https://www.developer.aero/Airport-API). Follow the link to get your API key (you will need to click a confirmation email). +First, go to this website about [airports](https://www.developer.aero/Airport-API). Follow the link to get your API key (you will need to click a confirmation email). List of all the airports on the planet: diff --git a/38_shiny.Rmd b/38_shiny.Rmd index 7c60f38..e71ce7b 100644 --- a/38_shiny.Rmd +++ b/38_shiny.Rmd @@ -15,7 +15,7 @@ Many people have written packages that enhance Shiny in some way or add extra fu * [shinyjs] - Enhance user experience in Shiny apps using JavaScript functions without knowing JavaScript (on [CRAN](https://cloud.r-project.org/package=shinyjs); on [GitHub](https://github.com/daattali/shinyjs)). -* [leaflet][leaflet-web] - Add interactive maps to your apps (on [CRAN](https://cloud.r-project.org/package=leaflet); on [GitHub](https://github.com/rstudio/leaflet)). +* [leaflet][leaflet] - Add interactive maps to your apps (on [CRAN](https://cloud.r-project.org/package=leaflet); on [GitHub](https://github.com/rstudio/leaflet)). * [ggvis] - Similar to [ggplot2], but the plots are focused on being web-based and are more interactive (on [CRAN](https://cloud.r-project.org/package=ggvis); on [GitHub](https://github.com/rstudio/ggvis)). *Currently dormant* @@ -29,7 +29,7 @@ Shiny is a very popular package and has lots of resources on the web. Here's a c - Official [Shiny] website - Official [Shiny tutorial] - RStudio [Shiny Cheat Sheet] -- [Lots of short useful articles about different topics in Shiny - **highly recommended**](https://shiny.rstudio.com/articles/) +- [Lots of short useful articles about different topics in Shiny](https://shiny.rstudio.com/articles/) - **highly recommended** - [Shiny in R Markdown](http://rmarkdown.rstudio.com/authoring_shiny.html) - Get help from the [Shiny Google group](https://groups.google.com/forum/#!forum/shiny-discuss) or [StackOverflow](https://stackoverflow.com/questions/tagged/shiny) - Publish your apps for free with [shinyapps.io] diff --git a/39_appendix.Rmd b/39_appendix.Rmd index 0a297fb..cdf4ad1 100644 --- a/39_appendix.Rmd +++ b/39_appendix.Rmd @@ -48,7 +48,7 @@ oldies_formatted %>% set.seed(328) ``` -```{r draw-an-owl, echo = FALSE, fig.cap = '"How to draw an owl" from [imgur](http://imgur.com/gallery/RadSf)"', out.width = "50%"} +```{r draw-an-owl, echo = FALSE, fig.cap = '"How to draw an owl" from [imgur](http://imgur.com/gallery/RadSf)', out.width = "50%"} knitr::include_graphics("img/how-to-draw-an-own-imgur.jpg") ``` @@ -75,7 +75,7 @@ It turns out you can write R (or S) for ~20 years and not be very facile with th ### Start at the beginning -My reference is the section of Wickham's [Advanced R] [-@wickham2015a] that is about [closures](http://adv-r.had.co.nz/Functional-programming.html#closures), "functions written by functions". Here's one of the two main examples: a function that creates an exponentiation function. +My reference is the section of Wickham's [Advanced R] [-@wickham2015a] book that is about [closures](http://adv-r.had.co.nz/Functional-programming.html#closures), "functions written by functions". Here's one of the two main examples: a function that creates an exponentiation function. ```{r} power <- function(exponent) { @@ -285,13 +285,6 @@ The final version of the function factory is about a dozen lines of fairly pedes *The __results__ of this effort are, however, pretty gratifying. I have had zero build/check failures locally and on Travis, since I implemented retries on `httr::GET()`. Or, to be honest, I've had failures, but for other reasons. So it was totally worth it! I also thank Konrad Rudolph and Kevin Ushey for [straightening me out](https://gist.github.com/jennybc/65c577f98c2bad7e2b3d0ccb773dfaf8) on the need to use `force()` inside the function factory.* -## How to obtain a bunch of GitHub issues or pull requests with R {#gh-package} - -[Using dplyr + purrr + tidyr](https://github.com/jennybc/analyze-github-stuff-with-r) to analyze data about GitHub repos via the [gh] package - -## How to tame XML with nested data frames and purrr {#tame-google-sheets} - -[Using dplyr + purrr + tidyr + xml2](https://github.com/jennybc/manipulate-xml-with-purrr-dplyr-tidyr) to tame the annoying XML from Google Sheets ## Make browsing your GitHub repos more rewarding {#github-browsability} @@ -364,7 +357,7 @@ or like [this](https://gist.github.com/jennybc/402761e30b9be8023af9#file-yaml_fr In RStudio, when editing `.Rmd`, click on the gear next to "Knit HTML" for YAML authoring help For a quick, stand-alone document that doesn't fit neatly into a repository or project (yet), make it a [Gist](https://gist.github.com). -__Example:__ Hadley Wickham's [advice on what you need to do to become a data scientist](https://gist.github.com/hadley/820f09ded347c62c2864). Gists can contain multiple files, so you can still provide the R script or R Markdown source __and__ the resulting Markdown, as I've done in this write-up of [Twitter-sourced tips for cross-tabulation](https://gist.github.com/jennybc/04b71bfaaf0f88d9d2eb). +Example: Hadley Wickham's [advice on what you need to do to become a data scientist](https://gist.github.com/hadley/820f09ded347c62c2864). Gists can contain multiple files, so you can still provide the R script or R Markdown source __and__ the resulting Markdown, as I've done in this write-up of [Twitter-sourced tips for cross-tabulation](https://gist.github.com/jennybc/04b71bfaaf0f88d9d2eb). ### `README.md` @@ -372,7 +365,7 @@ You probably already know that GitHub renders `README.md` at the top-level of yo Implication: for any logical group of files or mini project-within-your-project, create a sub-directory in your repository. And then create a `README.md` file to annotate these files, collect relevant links, etc. Now when you navigate to the sub-directory on GitHub the nicely rendered `README.md` will simply appear. -Some repositories consist solely of `README.md`. __Examples:__ Jeff Leek's write-ups on [How to share data with a statistician](https://github.com/jtleek/datasharing) or [Developing R packages](https://github.com/jtleek/rpackages). I am becoming a bigger fan of `README`-only repos than gists because repo issues trigger notifications, whereas comments on gists do not. +Some repositories consist solely of `README.md`. Examples: Jeff Leek's write-ups on [How to share data with a statistician](https://github.com/jtleek/datasharing) or [Developing R packages](https://github.com/jtleek/rpackages). I am becoming a bigger fan of `README`-only repos than gists because repo issues trigger notifications, whereas comments on gists do not. If you've got a directory full of web-friendly figures, such as PNGs, you can use [code like this](https://gist.github.com/jennybc/0239f65633e09df7e5f4) to generate a `README.md` for a quick DIY gallery, as Karl Broman has done with [his FruitSnacks](https://github.com/kbroman/FruitSnacks/blob/master/PhotoGallery.md). I have also used this device to share Keynote slides on GitHub (*mea culpa!*). Export them as PNGs images and throw 'em into a README gallery: slides on [file organization](https://github.com/Reproducible-Science-Curriculum/rr-organization1/tree/27883c8fc4cdd4dcc6a8232f1fe5c726e96708a0/slides/organization-slides) and some on [file naming](https://github.com/Reproducible-Science-Curriculum/rr-organization1/tree/27883c8fc4cdd4dcc6a8232f1fe5c726e96708a0/slides/naming-slides). @@ -473,16 +466,6 @@ and here it is with placeholders: AFAIK, to do that in a slick automatic way across an entire repo/site, you need to be using Jekyll or some other automated system. But you could easily handcode such links on a small scale. -## How to send a bunch of emails from R {#email-in-r} - -[Workflow](https://github.com/jennybc/send-email-with-r) for sending email with R and [gmailr](https://cloud.R-project.org/package=gmailr). - -## Store an API key as an environment variable {#store-api-key} - - - -This can be found [here](https://happygitwithr.com/credential-caching.html). - ## Data Carpentry lesson on tidy data {#data-carp-tidy-data} @@ -510,59 +493,155 @@ Instructor dependencies: * curl if you execute the code to grab the Lord of the Rings data used in examples from GitHub. Note that the files are also included in the `datacarpentry/data/tidy-data/` directory, so data download is avoidable. * rmarkdown, knitr, and xtable if you want to compile the `Rmd` to `md` and `html`. -# Contributing Guide {#contributing} +## How to obtain a bunch of GitHub issues or pull requests with R {#gh-package} + +[Using dplyr + purrr + tidyr](https://github.com/jennybc/analyze-github-stuff-with-r) to analyze data about GitHub repos via the [gh] package + +## How to tame XML with nested data frames and purrr {#tame-google-sheets} + +[Using dplyr + purrr + tidyr + xml2](https://github.com/jennybc/manipulate-xml-with-purrr-dplyr-tidyr) to tame the annoying XML from Google Sheets + +## How to send a bunch of emails from R {#email-in-r} + +[Workflow](https://github.com/jennybc/send-email-with-r) for sending email with R and [gmailr](https://cloud.R-project.org/package=gmailr). + +## Store an API key as an environment variable {#store-api-key} + + + +This can be found [here](https://happygitwithr.com/credential-caching.html). + + +# Contributing {#contributing} ## Link Reference Formatting -We have *a lot* of links in this bookdown. In an effort to keep things tidy, we have developed a method of organizing them. If you would like to contribute to this bookdown we ask that you please format any new links the same way. +There are *a lot* of links in this bookdown. In an effort to keep things organized, we have come up with a formatting system (see issue [#31](https://github.com/rstudio-education/stat545/issues/31) for some of the thoughts that went into this). +Some sections in the bookdown documentation on formatting links: -### Added to `links.md` +* https://bookdown.org/yihui/bookdown/markdown-syntax.html#inline-formatting +* https://bookdown.org/yihui/bookdown/cross-references.html -Any links that are used 2+ chapters (a.k.. `.Rmd` files) or any links that we *think* might be used in other chapters in the future are saved in `links.md` in the format `[ref-label]: link`. We try to have `ref-label` to be some text that we can usually live with as link text when possible. +### Popular links: `links.md` -For example, +The same links are frequently referenced in multiple chapters in this bookdown. To cut down on repetition, we've added a file called [`links.md`](https://raw.githubusercontent.com/rstudio-education/stat545/master/links.md), which contains a list of label-link pairs formatted as `[label]: link`. -> [Happy Git and GitHub for the useR]: https://happygitwithr.com +This file serves as a kind of "global" directory of link references for all of the chapters (i.e. Rmd files) in this bookdown. Each chapter has this code chunk at the end, +````markdown +`r ''````{r links, child="links.md"} +``` +```` -CHECK THIS -The reference label is not case-sensitive, +which tells it to treat `links.md` as a child document, giving it access to the label-link pairs in `links.md`. -### Added as a reference-style link +#### What types of links are already in `links.md`? -BOOKDOWN DOCS LINK? +* STAT 545 external resources/content + + Related materials that are not in this bookdown (e.g. [Tidy data using Lord of the Rings]) +* Packages: main link + + Where `[package-name]` should send us; what we consider to be the "homepage" for a package +* Packages: vignettes & CRAN/GitHub links + + Package vignettes that are mentioned in multiple chapters (e.g. the dplyr [Window functions] vignette) + + CRAN and GitHub links that are mentioned in multiple chapters (e.g. the dplyr [CRAN][dplyr-cran] page) +* Bookdowns: main link + + Where `[Bookdown Title]` should send us +* Bookdowns: specific chapters + + Specific bookdown chapters that are mentioned in multiple chapters (e.g. the [Data transformation][r4ds-transform] chapter from R for Data Science) +* Cheat sheets +* Blog posts & slides +* Papers/books cited +* Misc. - anything else -If a link appears in only one chapter (a.k.a. `.Rmd`) & is used more than once, format it as a reference-style link. Again, trying to use a reference label that is some text that we can usually live with as link text. +#### When should a link be added to `links.md`? -For example, +If a link is used in multiple chapters (i.e. Rmd files) - or you think it will be in the future - go ahead and add it to `links.md`. Format it as `[label]: link`, where `label` is some text that we can usually live with as the link text. + +For example, the link to the [Happy Git and GitHub for the useR] book is in `links.md` as + +``` +[Happy Git and GitHub for the useR]: https://happygitwithr.com +``` + +Before we had to write out... + +`[Happy Git and GitHub for the useR](https://happygitwithr/com)` + +...every time to get [Happy Git and GitHub for the useR]. Now just writing `[Happy Git and GitHub for the useR]` will get us the same thing. -> It's not really fair to complain about the lack of visible alignment. Remember we are ["writing data for computers"]. -> ... -> Huh? Don't worry about it. Remember we are ["writing data for computers"]. -> ... -> -> ["writing data for computers"]: https://twitter.com/vsbuffalo/statuses/358699162679787521 +#### What if there isn't a useful `label`? +Occasionally there won't be a `label` that also makes for a convenient link text. For those cases, try to choose a `label` that is both descriptive and unique. +For example, the Data Transformation chapter of R for Data Science is mentioned in multiple chapters and is in `links.md` as -generally, if first time package is being mentioned in a chapter, include main link. e.g. [gapminder] -### Added as an inline-style link +``` +[r4ds-transform]: https://r4ds.had.co.nz/transform.html +``` + +Writing `[Data transformation][r4ds-transform]` will appear as [Data transformation][r4ds-transform]. + +### One-off links + +For links that only appear in one chapter (i.e. a single Rmd file), there are two options. +1. Inline-style links +1. Reference-style links +#### Inline-style + +If the link is a one-hit-wonder and will likely not be used again, include it as a regular inline-style link. + +This.... + +``` +Hypothetical: a [zombie project](https://imgur.com/ewmBeQG) comes back to life... +``` +will appear as.... +> Hypothetical: a [zombie project](https://imgur.com/ewmBeQG) comes back to life... +*However*, if you think that it makes more sense to include it as a reference-style link or in `links.md`, that is okay too. -If you anticipating the link being used again in another chapter, go ahead and put it in `links.md`. -If it is a main package that you think will be used in multiple chapters, then include links in `links.md`. +#### Reference-style + +If the link is used more than once within the chapter, format it as a reference-style link. Add it at the end of the Rmd file formated as `[label]: link`, where `label` is some text we can usually live with as the actual link text. + +This... + +``` +It's not really fair to complain about the lack of visible alignment. Remember we are ["writing data for computers"]. + +... + +Huh? Don't worry about it. Remember we are ["writing data for computers"]. + +... + + +["writing data for computers"]: https://twitter.com/vsbuffalo/statuses/358699162679787521 +``` + + +will appear as... + +> It's not really fair to complain about the lack of visible alignment. Remember we are ["writing data for computers"]. +> +> ... +> +> Huh? Don't worry about it. Remember we are ["writing data for computers"]. +> +> ... +> +> +> ["writing data for computers"]: https://twitter.com/vsbuffalo/statuses/358699162679787521 + +*However*, if you think that it makes more sense to include it as a reference-style link or in `links.md`, that is okay too. -"main link" for a package priority order - 1. package website - 2. github home - 3. cran page ```{r links, child="links.md"} ``` \ No newline at end of file diff --git a/links.md b/links.md index 381c057..ef689bf 100644 --- a/links.md +++ b/links.md @@ -46,7 +46,7 @@ [shinythemes]: https://rstudio.github.io/shinythemes/ [shinyjs]: https://deanattali.com/shinyjs/ [leaflet]: https://rstudio.github.io/leaflet/ -[ggvis-web]: https://ggvis.rstudio.com +[ggvis]: https://ggvis.rstudio.com [shinydashboard]: https://rstudio.github.io/shinydashboard/ @@ -73,26 +73,11 @@ [r4ds-transform]: https://r4ds.had.co.nz/transform.html [r4ds-readr-strings]: https://r4ds.had.co.nz/data-import.html#readr-strings - -[rOpenSci]: https://ropensci.org -[wiki-snake-case]: https://en.wikipedia.org/wiki/Snake_case -[Janus]: https://en.wikipedia.org/wiki/Janus - [RStudio Data Transformation Cheat Sheet]: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf [Regular Expressions in R Cheat Sheet]: https://github.com/rstudio/cheatsheets/raw/master/regex.pdf [Shiny Cheat Sheet]: https://shiny.rstudio.com/articles/cheatsheet.html - - -["Dates and Times Made Easy with lubridate"]: https://www.jstatsoft.org/article/view/v040i03 -["testthat: Get Started with Testing"]: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf -["Let's Practice What We Preach"]: https://www.jstor.org/stable/3087382?seq=1#page_scan_tab_contents -[Creating More Effective Graphs]: https://www.amazon.com/Creating-Effective-Graphs-Naomi-Robbins/dp/0985911123 -["Escaping RGBland: Selecting Colors for Statistical Graphs"]: https://eeecon.uibk.ac.at/~zeileis/papers/Zeileis+Hornik+Murrell-2009.pdf -["A layered grammar of graphics"]: https://vita.had.co.nz/papers/layered-grammar.html -[Managing Projects with GNU Make, 3rd Edition]: http://shop.oreilly.com/product/9780596006105.do - ["minimal make: a minimal tutorial on make"]: https://kbroman.org/minimal_make/ ["Let the Data Flow: Pipelines in R with dplyr and magrittr"]: https://github.com/tjmahr/MadR_Pipelines @@ -102,4 +87,18 @@ ["What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text"]: http://kunststube.net/encoding/ ["3 Steps to Fix Encoding Problems in Ruby"]: https://www.justinweiss.com/articles/3-steps-to-fix-encoding-problems-in-ruby/ ["My favorite RGB color"]: https://manyworldstheory.com/2013/01/15/my-favorite-rgb-color/ + + +["Dates and Times Made Easy with lubridate"]: https://www.jstatsoft.org/article/view/v040i03 +["testthat: Get Started with Testing"]: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf +["Let's Practice What We Preach"]: https://www.jstor.org/stable/3087382?seq=1#page_scan_tab_contents +[Creating More Effective Graphs]: https://www.amazon.com/Creating-Effective-Graphs-Naomi-Robbins/dp/0985911123 +["Escaping RGBland: Selecting Colors for Statistical Graphs"]: https://eeecon.uibk.ac.at/~zeileis/papers/Zeileis+Hornik+Murrell-2009.pdf +["A layered grammar of graphics"]: https://vita.had.co.nz/papers/layered-grammar.html +[Managing Projects with GNU Make, 3rd Edition]: http://shop.oreilly.com/product/9780596006105.do ["Why Should Engineers and Scientists Be Worried About Color?"]: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=2ahUKEwi0xYqJ8JbjAhWNvp4KHViYDxsQFjABegQIABAC&url=https%3A%2F%2Fwww.researchgate.net%2Fprofile%2FAhmed_Elhattab2%2Fpost%2FPlease_suggest_some_good_3D_plot_tool_Software_for_surface_plot%2Fattachment%2F5c05ba35cfe4a7645506948e%2FAS%253A699894335557644%25401543879221725%2Fdownload%2FWhy%2BShould%2BEngineers%2Band%2BScientists%2BBe%2BWorried%2BAbout%2BColor_.pdf&usg=AOvVaw1qwjjGMd7h_z6TLUjzu7Nb + + +[rOpenSci]: https://ropensci.org +[wiki-snake-case]: https://en.wikipedia.org/wiki/Snake_case +[Janus]: https://en.wikipedia.org/wiki/Janus \ No newline at end of file