From 2065ac76cf2c7b34da8ca4b481a309eae849db6a Mon Sep 17 00:00:00 2001
From: ytakemon <yuka.takemon@gmail.com>
Date: Wed, 2 Oct 2024 08:55:24 -0700
Subject: [PATCH 1/5] fixes #285

---
 episodes/03-basics-factors-dataframes.Rmd | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/episodes/03-basics-factors-dataframes.Rmd b/episodes/03-basics-factors-dataframes.Rmd
index dff912a6..6a709b19 100644
--- a/episodes/03-basics-factors-dataframes.Rmd
+++ b/episodes/03-basics-factors-dataframes.Rmd
@@ -228,12 +228,13 @@ str(subset)
 
 Ok, thats a lot up unpack! Some things to notice.
 
-- the object type `data.frame` is displayed in the first row along with its
+- The object type `data.frame` is displayed in the first row along with its
   dimensions, in this case 801 observations (rows) and 4 variables (columns)
-- Each variable (column) has a name (e.g. `sample_id`). This is followed
-  by the object mode (e.g. chr, int, etc.). Notice that before each
+- Each variable (column) has a name (e.g. `sample_id`). Notice that before each
   variable name there is a `$` - this will be important later.
-
+- Each variable name is followed by the data type it contains (e.g. chr, int, etc.). 
+  The `int` type shows an integer, which is a type of numerical data, where it can only 
+  store whole numbers (i.e. no decimal points ).
 
 
   :::::::::::::::::::::::::::::::::::::::  challenge

From f11772a87baa315c5b846918d94c3a568efa930c Mon Sep 17 00:00:00 2001
From: ytakemon <yuka.takemon@gmail.com>
Date: Wed, 2 Oct 2024 09:02:06 -0700
Subject: [PATCH 2/5] fixes #284

---
 episodes/03-basics-factors-dataframes.Rmd | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/episodes/03-basics-factors-dataframes.Rmd b/episodes/03-basics-factors-dataframes.Rmd
index 6a709b19..3d5512d8 100644
--- a/episodes/03-basics-factors-dataframes.Rmd
+++ b/episodes/03-basics-factors-dataframes.Rmd
@@ -298,10 +298,19 @@ head(alt_alleles)
 ```
 
 There are 801 alleles (one for each row). To simplify, lets look at just the
-single-nucleotide alleles (SNPs). We can use some of the vector indexing skills
-from the last episode.
+single-nucleotide alleles (SNPs). 
+
+Let's review some of the vector indexing skills from the last episode that can help:
 
 ```{r, purl=FALSE}
+# This will find all matching alleles with the single nucleotide "A" and provide a TRUE/FASE vector
+alt_alleles == "A"
+
+# Then, we wrap them into an index to pull all the positions that match this. 
+alt_alleles[alt_alleles == "A"]
+
+# If we repeat this for each nucleotide A, T, G, and C, and connect them using `c()`,
+# we can index all the single nucleotide changes.
 snps <- c(alt_alleles[alt_alleles == "A"],
   alt_alleles[alt_alleles=="T"],
   alt_alleles[alt_alleles=="G"],

From 4a9e7bf54156794054546d80786817717d49cb5e Mon Sep 17 00:00:00 2001
From: ytakemon <yuka.takemon@gmail.com>
Date: Wed, 2 Oct 2024 09:07:21 -0700
Subject: [PATCH 3/5] fixes #283

---
 episodes/03-basics-factors-dataframes.Rmd | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/episodes/03-basics-factors-dataframes.Rmd b/episodes/03-basics-factors-dataframes.Rmd
index 3d5512d8..9ae712c0 100644
--- a/episodes/03-basics-factors-dataframes.Rmd
+++ b/episodes/03-basics-factors-dataframes.Rmd
@@ -328,7 +328,13 @@ plot(snps)
 ```
 
 Whoops! Though the `plot()` function will do its best to give us a quick plot,
-it is unable to do so here. One way to fix this it to tell R to treat the SNPs
+it is unable to do so here. Let's use `str()` to see why this might be:
+
+```{r, purl=FALSE}
+str(snps)
+```
+
+R may not know how to plot a character vector! One way to fix this it to tell R to treat the SNPs
 as categories (i.e. a factor vector); we will create a new object to avoid
 confusion using the `factor()` function:
 
@@ -359,9 +365,12 @@ We can see how many items in our vector fall into each category:
 
 ```{r, purl=FALSE}
 summary(factor_snps)
+
+# Compare the character vector 
+summary(snps)
 ```
 
-As you can imagine, this is already useful when you want to generate a tally.
+As you can imagine, factors are already useful when you want to generate a tally.
 
 :::::::::::::::::::::::::::::::::::::::::  callout
 

From ff44a81b7bccced4462f8aa74cc16f590e91b2b5 Mon Sep 17 00:00:00 2001
From: ytakemon <yuka.takemon@gmail.com>
Date: Wed, 2 Oct 2024 09:32:55 -0700
Subject: [PATCH 4/5] fixes 291

---
 episodes/03-basics-factors-dataframes.Rmd | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/episodes/03-basics-factors-dataframes.Rmd b/episodes/03-basics-factors-dataframes.Rmd
index 9ae712c0..83ec274e 100644
--- a/episodes/03-basics-factors-dataframes.Rmd
+++ b/episodes/03-basics-factors-dataframes.Rmd
@@ -185,7 +185,15 @@ you have the `variants` object, listed as 801 obs. (observations/rows)
 of 29 variables (columns). Double-clicking on the name of the object will open
 a view of the data in a new tab.
 
-![RStudio data frame view]("fig/rstudio_dataframeview.png")
+![RStudio data frame view]("epidoes/fig/rstudio_dataframeview.png")
+
+We can also quickly query the dimensions of the variable using `dim()`. You'll see that the first number `801` shows the number of rows, then `29` the number of columns
+
+```{r, purl=FALSE}
+## get summary statistics on a data frame
+
+dim(variants)
+```
 
 ## Summarizing, subsetting, and determining the structure of a data frame.
 
@@ -209,11 +217,15 @@ other variables (e.g. `sample_id`) are treated as characters data (more on this
 in a bit).
 
 There is a lot to work with, so we will subset the first three columns into a
-new data frame using the `data.frame()` function.
+new data frame using the `data.frame()` function. To subset/index a two dimensional
+variable, we need to define them on the appropriate side of the brackets. The left
+hand side of the comma indicates the rows you want to subset, and the right is the
+column position (e.g. ["row index", "column index"]).
 
 ```{r, purl=FALSE}
 ## put the first three columns of variants into a new data frame called subset
-
+## Notice that we are wrapping the numbers in a c() function, to indicate a vector
+## in the right hand side of the comma. 
 subset <- data.frame(variants[, c(1:3, 6)])
 ```
 

From 4752cd1596f8df4ea24682c03b8de31dfb787074 Mon Sep 17 00:00:00 2001
From: ytakemon <yuka.takemon@gmail.com>
Date: Wed, 2 Oct 2024 09:35:51 -0700
Subject: [PATCH 5/5] fixes 292

---
 episodes/03-basics-factors-dataframes.Rmd | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/episodes/03-basics-factors-dataframes.Rmd b/episodes/03-basics-factors-dataframes.Rmd
index 83ec274e..20544a80 100644
--- a/episodes/03-basics-factors-dataframes.Rmd
+++ b/episodes/03-basics-factors-dataframes.Rmd
@@ -216,14 +216,15 @@ these columns, as well as mean, median, and interquartile ranges. Many of the
 other variables (e.g. `sample_id`) are treated as characters data (more on this
 in a bit).
 
-There is a lot to work with, so we will subset the first three columns into a
-new data frame using the `data.frame()` function. To subset/index a two dimensional
-variable, we need to define them on the appropriate side of the brackets. The left
-hand side of the comma indicates the rows you want to subset, and the right is the
-column position (e.g. ["row index", "column index"]).
+There is a lot to work with, so we will subset the columns into a new data frame using
+the `data.frame()` function. To subset/index a two dimensional variable, we need to
+define them on the appropriate side of the brackets. The left hand side of the comma
+indicates the rows you want to subset, and the right is the column position 
+(e.g. ["row index", "column index"]).
+
+Let's put the columns 1, 2, 3, and 6 into a new data frame called subset:
 
 ```{r, purl=FALSE}
-## put the first three columns of variants into a new data frame called subset
 ## Notice that we are wrapping the numbers in a c() function, to indicate a vector
 ## in the right hand side of the comma. 
 subset <- data.frame(variants[, c(1:3, 6)])