diff --git a/_episodes_rmd/01-starting-with-data.Rmd b/_episodes_rmd/01-starting-with-data.Rmd
index a92254ea4..d51c2cc96 100644
--- a/_episodes_rmd/01-starting-with-data.Rmd
+++ b/_episodes_rmd/01-starting-with-data.Rmd
@@ -6,7 +6,7 @@ questions:
- "How do I read data into R?"
- "How do I assign variables?"
- "What is a data frame?"
-- "How do I access subsets a data frame?"
+- "How do I access subsets of a data frame?"
- "How do I calculate simple statistics like mean and median?"
- "Where can I get help?"
- "How can I plot my data?"
@@ -22,7 +22,7 @@ keypoints:
- "The function `dim` gives the dimensions of a data frame."
- "Use `object[x, y]` to select a single element from a data frame."
- "Use `from:to` to specify a sequence that includes the indices from `from` to `to`."
-- "All the indexing and slicing that works on data frames also works on vectors."
+- "All the indexing and subsetting that works on data frames also works on vectors."
- "Use `#` to add comments to programs."
- "Use `mean`, `max`, `min` and `sd` to calculate simple statistics."
- "Use `apply` to calculate statistics across the rows or columns of a data frame."
@@ -114,20 +114,25 @@ We can create a new variable and assign a value to it using `<-`
weight_kg <- 55
```
-Once a variable has a value, we can print it by typing the name of the variable and hitting `Enter` (or `return`).
+Once a variable is created, we can use the variable name to refer to the value it was assigned. The variable name now acts as a tag. Whenever R reads that tag (`weight_kg`), it substitutes the value (`55`).
+
+
+
+To see the value of a variable, we can print it by typing the name of the variable and hitting `Enter` (or `return`).
In general, R will print to the console any object returned by a function or operation *unless* we assign it to a variable.
```{r}
weight_kg
```
-
-We can do arithmetics with the variable:
+We can treat our variable like a regular number, and do arithmetic it:
```{r}
# weight in pounds:
2.2 * weight_kg
```
+
+
> ## Commenting
>
> We can add comments to our code using the `#` character. It is useful to
@@ -152,13 +157,13 @@ weight_kg
> a [chapter](http://r-pkgs.had.co.nz/style.html) on this and other style considerations.
{: .callout}
-If we imagine the variable as a sticky note with a name written on it,
-assignment is like putting the sticky note on a particular value:
+
+
+Assigning a new value to a variable breaks the connection with the old value; R forgets that number and applies the variable name to the new value.
-
+When you assign a value to a variable, R only stores the value, not the calculation you used to create it. This is an important point if you're used to the way a spreadsheet program automatically updates linked cells. Let's look at an example.
-This means that assigning a value to one object does not change the values of other variables.
-For example, let's store the subject's weight in pounds in a variable:
+First, we'll convert `weight_kg` into pounds, and store the new value in the variable `weight_lb`:
```{r}
weight_lb <- 2.2 * weight_kg
@@ -168,9 +173,12 @@ weight_kg
weight_lb
```
-
+In words, we're asking R to look up the value we tagged `weight_kg`,
+multiply it by 2.2, and tag the result with the name `weight_lb`:
+
+
-and then change `weight_kg`:
+If we now change the value of `weight_kg`:
```{r}
weight_kg <- 100.0
@@ -180,7 +188,7 @@ weight_kg
weight_lb
```
-
+
Since `weight_lb` doesn't "remember" where its value came from, it isn't automatically updated when `weight_kg` changes.
This is different from the way spreadsheets work.
@@ -229,8 +237,9 @@ First, let's ask what type of thing `dat` is:
class(dat)
```
-The output tells us that it is a data frame. We can think of this as a spreadsheet in MS Excel, which many of us are familiar with.
-Data frames are very useful for organizing data and you will find them elsewhere when programming in R. A typical data frame of experimental data contains individual observations in rows and variables in columns.
+The output tells us that is a data frame. Think of this structure as a spreadsheet in MS Excel that many of us are familiar with.
+Data frames are very useful for storing data and you will use them frequently when programming in R.
+A typical data frame of experimental data contains individual observations in rows and variables in columns.
We can see the shape, or [dimensions]({{ page.root }}/reference/#dimensions-of-an-array), of the data frame with the function `dim`:
@@ -242,42 +251,51 @@ This tells us that our data frame, `dat`, has `r nrow(dat)` rows and `r ncol(dat
If we want to get a single value from the data frame, we can provide an [index]({{ page.root }}/reference/#index) in square brackets. The first number specifies the row and the second the column:
-```{r}
-# The first value in dat is indexed at row 1 column 1
+```{r selecting data frame elements}
+# first value in dat, row 1, column 1
dat[1, 1]
-# The middle value in dat is indexed at row 30 column 20
+# middle value in dat, row 30, column 20
dat[30, 20]
```
-An index like `[30, 20]` selects a single element of a data frame, but we can select whole sections as well.
-For example, we can select values for the first four patients (rows) during the first ten days of treatment (columns) like this:
+The first value in a data frame index is the row, the second value is the column.
+If we want to select more than one row or column, we can use the function `c`, which stands for **c**ombine.
+For example, to pick columns 10 and 20 from rows 1, 3, and 5, we can do this:
-```{r}
-dat[1:4, 1:10]
+```{r selecting with c}
+dat[c(1, 3, 5), c(10, 20)]
+```
+
+We frequently want to select contiguous rows or columns, such as the first ten rows, or columns 3 through 7. You can use `c` for this, but it's more convenient to use the `:` operator. This special function generates sequences of numbers:
+
+```{r sequences}
+1:5
+3:12
```
-The slice does not need to start at 1, e.g. the line below selects rows 5 through 10, and columns 3 through 10 :
+For example, we can select the first ten columns of values for the first four rows like this:
```{r}
-dat[5:10, 3:10]
+dat[1:4, 1:10]
```
-We can use the function `c`, which stands for **c**ombine, to select non-contiguous values:
+
+or the first ten columns of rows 5 to 10 like this:
```{r}
-dat[c(3, 8, 37, 56), c(10, 14, 29)]
+dat[5:10, 1:10]
```
-We can also provide a slice for the rows but not for the columns, or for the columns but not for the rows.
-If we don't include a slice for the rows, R returns all the rows; if we don't include a slice for the columns, R returns all the columns.
-If we don't provide a slice for either rows or columns, e.g. `dat[, ]`, R returns the full data frame.
+If you want to select all rows or all columns, leave that index value empty.
```{r}
# All columns from row 5
dat[5, ]
-# All rows from column 16
-dat[, 16]
+# All rows from column 16-18
+dat[, 16:18]
```
+If you leave both index values empty (i.e., `dat[, ]`), you get the entire data frame.
+
> ## Addressing Columns by Name
>
> Columns can also be addressed by name, with either the `$` operator (ie. `dat$Age`) or square brackets (ie. `dat[,'Age']`).
@@ -294,6 +312,42 @@ patient_1 <- dat[1, ]
# max inflammation for patient 1
max(patient_1)
```
+
+
+
We don't actually need to store the row in a variable of its own.
Instead, we can combine the selection and the function call:
@@ -373,10 +427,10 @@ We'll learn why this is so in the next lesson.
> `colMeans`, respectively.
{: .callout}
-> ## Slicing (Subsetting) Data
+
+> ## Subsetting Data
>
-> A subsection of a data frame is called a [slice]({{ page.root }}/reference/#slice).
-> We can take slices of character vectors as well:
+> We can take subsets of character vectors as well:
>
> ```{r}
> animal <- c("m", "o", "n", "k", "e", "y")
@@ -399,7 +453,7 @@ We'll learn why this is so in the next lesson.
> ## Subsetting More Data
>
> Suppose you want to determine the maximum inflammation for patient 5 across days three to seven.
-> To do this you would extract the relevant slice from the data frame and calculate the maximum value.
+> To do this you would extract the relevant subset from the data frame and calculate the maximum value.
> Which of the following lines of R code gives the correct answer?
>
> 1. `max(dat[5, ])`
@@ -416,7 +470,7 @@ We'll learn why this is so in the next lesson.
> {: .solution}
{: .challenge}
-> ## Slicing and Re-Assignment
+> ## Subsetting and Re-Assignment
>
> Using the inflammation data frame `dat` from above:
> Let's pretend there was something wrong with the instrument on the first five days for every second patient (#2, 4, 6, etc.), which resulted in the measurements being twice as large as they should be.
diff --git a/fig/arithmetic-variables.svg b/fig/arithmetic-variables.svg
new file mode 100644
index 000000000..cfe9fb8e1
--- /dev/null
+++ b/fig/arithmetic-variables.svg
@@ -0,0 +1,253 @@
+
+
+
+
diff --git a/fig/memory-variables.svg b/fig/memory-variables.svg
new file mode 100644
index 000000000..076a4794a
--- /dev/null
+++ b/fig/memory-variables.svg
@@ -0,0 +1,298 @@
+
+
+
+
diff --git a/fig/new-variables.svg b/fig/new-variables.svg
new file mode 100644
index 000000000..53cfa9f7b
--- /dev/null
+++ b/fig/new-variables.svg
@@ -0,0 +1,334 @@
+
+
+
+
diff --git a/fig/reassign-variables.svg b/fig/reassign-variables.svg
new file mode 100644
index 000000000..f44f30ca8
--- /dev/null
+++ b/fig/reassign-variables.svg
@@ -0,0 +1,244 @@
+
+
+
+
diff --git a/fig/tag-variables.svg b/fig/tag-variables.svg
new file mode 100644
index 000000000..f2af6f0a3
--- /dev/null
+++ b/fig/tag-variables.svg
@@ -0,0 +1,141 @@
+
+
+
+