sticking to summarise() syntax

mgimond · Feb 23, 2024 · 77a7de3 · 77a7de3
1 parent b623ed0
commit 77a7de3
Show file tree

Hide file tree

Showing 17 changed files with 10 additions and 10 deletions.
diff --git a/docs/group_by.html b/docs/group_by.html
@@ -336,7 +336,7 @@
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
-  <a href="./robustness.html" class="sidebar-item-text sidebar-link">
+  <a href="./case_study.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text"><span class="chapter-number">23</span>&nbsp; <span class="chapter-title">A working example: t-tests and re-expression</span></span></a>
   </div>
 </li>
@@ -512,13 +512,13 @@ <h2 data-number="10.1" class="anchored" data-anchor-id="summarizing-data-by-grou
 <p>The goal will be to summarize the table by <code>Weekday</code> as shown in the following graphic.</p>
 <p><img src="img/Summarize_by_one_variable.png" style="width: 73%; height: auto;"></p>
 <p>The data table has three variables: <code>Weekday</code>, <code>Quarter</code> and <code>Delay</code>. <code>Delay</code> is the value we will summarize which leaves us with one variable to <em>collapse</em>: <code>Quarter</code>. In doing so, we will compute the <code>Delay</code> statistics for all quarters associated with a unique <code>Weekday</code> value.</p>
-<p>This workflow requires two operations: a grouping operation using the <code>group_by</code> function and a summary operation using the <code>summarise</code> function. Here, we’ll compute two summary statistics: minimum delay time and maximum delay time.</p>
-<div class="cell" data-hash="group_by_cache/html/unnamed-chunk-4_d81779338be38082a9993debd3a81a92">
+<p>This workflow requires two operations: a grouping operation using the <code>group_by</code> function and a summary operation using the <code>summarise</code>/<code>summarize</code> function. Here, we’ll compute two summary statistics: minimum delay time and maximum delay time.</p>
+<div class="cell" data-hash="group_by_cache/html/unnamed-chunk-4_db15eeb225ba52d8d0d6c4ebaf271c32">
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(dplyr)</span>
 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>df <span class="sc">%&gt;%</span> </span>
 <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>  <span class="fu">group_by</span>(Weekday) <span class="sc">%&gt;%</span> </span>
-<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>  <span class="fu">summarize</span>(<span class="at">min_delay =</span> <span class="fu">min</span>(Delay), <span class="at">max_delay =</span> <span class="fu">max</span>(Delay))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>  <span class="fu">summarise</span>(<span class="at">min_delay =</span> <span class="fu">min</span>(Delay), <span class="at">max_delay =</span> <span class="fu">max</span>(Delay))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
 <pre><code># A tibble: 5 × 3
   Weekday min_delay max_delay
@@ -546,10 +546,10 @@ <h3 data-number="10.1.1" class="anchored" data-anchor-id="grouping-by-multiple-v
 <p>The goal will be to summarize the delay time by <code>Quarter</code> and by <code>Week</code> type as shown in the following graphic.</p>
 <p><img src="img/Summarize_by_two_variable.png" style="width: 73%; height: auto;"></p>
 <p>This time, the data table has four variables. We are wanting to summarize by <code>Quater</code> and <code>Week</code> which leaves one variable, <code>Direction</code>, that needs to be collapsed.</p>
-<div class="cell" data-hash="group_by_cache/html/unnamed-chunk-6_f31d24f5b42ab4b84a1c2090a1293f36">
+<div class="cell" data-hash="group_by_cache/html/unnamed-chunk-6_6de698b8961425ae8b983a1493c2e376">
 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>df2 <span class="sc">%&gt;%</span> </span>
 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">group_by</span>(Quarter, Week) <span class="sc">%&gt;%</span> </span>
-<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">summarize</span>(<span class="at">min_delay =</span> <span class="fu">min</span>(Delay), <span class="at">max_delay =</span> <span class="fu">max</span>(Delay))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">summarise</span>(<span class="at">min_delay =</span> <span class="fu">min</span>(Delay), <span class="at">max_delay =</span> <span class="fu">max</span>(Delay))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
 <pre><code># A tibble: 8 × 4
 # Groups:   Quarter [4]

diff --git a/docs/search.json b/docs/search.json
@@ -326,7 +326,7 @@
     "href": "group_by.html#summarizing-data-by-group",
     "title": "10  Grouping and summarizing",
     "section": "10.1 Summarizing data by group",
-    "text": "10.1 Summarizing data by group\nLet’s first create a dataframe listing the average delay time in minutes, by day of the week and by quarter, for Logan airport’s 2014 outbound flights.\n\ndf &lt;- data.frame(\n  Weekday = factor(rep(c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\"), each = 4), \n                   levels = c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\")),\n  Quarter = paste0(\"Q\", rep(1:4, each = 5)), \n  Delay = c(9.9, 5.4, 8.8, 6.9, 4.9, 9.7, 7.9, 5, 8.8, 11.1, 10.2, 9.3, 12.2,\n            10.2, 9.2, 9.7, 12.2, 8.1, 7.9, 5.6))\n\nThe goal will be to summarize the table by Weekday as shown in the following graphic.\n\nThe data table has three variables: Weekday, Quarter and Delay. Delay is the value we will summarize which leaves us with one variable to collapse: Quarter. In doing so, we will compute the Delay statistics for all quarters associated with a unique Weekday value.\nThis workflow requires two operations: a grouping operation using the group_by function and a summary operation using the summarise function. Here, we’ll compute two summary statistics: minimum delay time and maximum delay time.\n\nlibrary(dplyr)\n\ndf %&gt;% \n  group_by(Weekday) %&gt;% \n  summarize(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 5 × 3\n  Weekday min_delay max_delay\n  &lt;fct&gt;       &lt;dbl&gt;     &lt;dbl&gt;\n1 Mon           5.4       9.9\n2 Tues          4.9       9.7\n3 Wed           8.8      11.1\n4 Thurs         9.2      12.2\n5 Fri           5.6      12.2\n\n\nNote that the weekday follows the chronological order as defined in the Weekday factor.\nYou’ll also note that the output is a tibble. This data class is discussed at the end of this page.\n\n10.1.1 Grouping by multiple variables\nYou can group by more than one variable. For example, let’s build another dataframe listing the average delay time in minutes, by quarter, by weekend/weekday and by inbound/outbound status for Logan airport’s 2014 outbound flights.\n\ndf2 &lt;- data.frame(\n  Quarter = paste0(\"Q\", rep(1:4, each = 4)), \n  Week = rep(c(\"Weekday\", \"Weekend\"), each=2, times=4),\n  Direction = rep(c(\"Inbound\", \"Outbound\"), times=8),\n  Delay = c(10.8, 9.7, 15.5, 10.4, 11.8, 8.9, 5.5, \n            3.3, 10.6, 8.8, 6.6, 5.2, 9.1, 7.3, 5.3, 4.4))\n\nThe goal will be to summarize the delay time by Quarter and by Week type as shown in the following graphic.\n\nThis time, the data table has four variables. We are wanting to summarize by Quater and Week which leaves one variable, Direction, that needs to be collapsed.\n\ndf2 %&gt;% \n  group_by(Quarter, Week) %&gt;% \n  summarize(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 8 × 4\n# Groups:   Quarter [4]\n  Quarter Week    min_delay max_delay\n  &lt;chr&gt;   &lt;chr&gt;       &lt;dbl&gt;     &lt;dbl&gt;\n1 Q1      Weekday       9.7      10.8\n2 Q1      Weekend      10.4      15.5\n3 Q2      Weekday       8.9      11.8\n4 Q2      Weekend       3.3       5.5\n5 Q3      Weekday       8.8      10.6\n6 Q3      Weekend       5.2       6.6\n7 Q4      Weekday       7.3       9.1\n8 Q4      Weekend       4.4       5.3\n\n\nThe following section demonstrates other grouping/summarizing operations on a larger dataset."
+    "text": "10.1 Summarizing data by group\nLet’s first create a dataframe listing the average delay time in minutes, by day of the week and by quarter, for Logan airport’s 2014 outbound flights.\n\ndf &lt;- data.frame(\n  Weekday = factor(rep(c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\"), each = 4), \n                   levels = c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\")),\n  Quarter = paste0(\"Q\", rep(1:4, each = 5)), \n  Delay = c(9.9, 5.4, 8.8, 6.9, 4.9, 9.7, 7.9, 5, 8.8, 11.1, 10.2, 9.3, 12.2,\n            10.2, 9.2, 9.7, 12.2, 8.1, 7.9, 5.6))\n\nThe goal will be to summarize the table by Weekday as shown in the following graphic.\n\nThe data table has three variables: Weekday, Quarter and Delay. Delay is the value we will summarize which leaves us with one variable to collapse: Quarter. In doing so, we will compute the Delay statistics for all quarters associated with a unique Weekday value.\nThis workflow requires two operations: a grouping operation using the group_by function and a summary operation using the summarise/summarize function. Here, we’ll compute two summary statistics: minimum delay time and maximum delay time.\n\nlibrary(dplyr)\n\ndf %&gt;% \n  group_by(Weekday) %&gt;% \n  summarise(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 5 × 3\n  Weekday min_delay max_delay\n  &lt;fct&gt;       &lt;dbl&gt;     &lt;dbl&gt;\n1 Mon           5.4       9.9\n2 Tues          4.9       9.7\n3 Wed           8.8      11.1\n4 Thurs         9.2      12.2\n5 Fri           5.6      12.2\n\n\nNote that the weekday follows the chronological order as defined in the Weekday factor.\nYou’ll also note that the output is a tibble. This data class is discussed at the end of this page.\n\n10.1.1 Grouping by multiple variables\nYou can group by more than one variable. For example, let’s build another dataframe listing the average delay time in minutes, by quarter, by weekend/weekday and by inbound/outbound status for Logan airport’s 2014 outbound flights.\n\ndf2 &lt;- data.frame(\n  Quarter = paste0(\"Q\", rep(1:4, each = 4)), \n  Week = rep(c(\"Weekday\", \"Weekend\"), each=2, times=4),\n  Direction = rep(c(\"Inbound\", \"Outbound\"), times=8),\n  Delay = c(10.8, 9.7, 15.5, 10.4, 11.8, 8.9, 5.5, \n            3.3, 10.6, 8.8, 6.6, 5.2, 9.1, 7.3, 5.3, 4.4))\n\nThe goal will be to summarize the delay time by Quarter and by Week type as shown in the following graphic.\n\nThis time, the data table has four variables. We are wanting to summarize by Quater and Week which leaves one variable, Direction, that needs to be collapsed.\n\ndf2 %&gt;% \n  group_by(Quarter, Week) %&gt;% \n  summarise(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 8 × 4\n# Groups:   Quarter [4]\n  Quarter Week    min_delay max_delay\n  &lt;chr&gt;   &lt;chr&gt;       &lt;dbl&gt;     &lt;dbl&gt;\n1 Q1      Weekday       9.7      10.8\n2 Q1      Weekend      10.4      15.5\n3 Q2      Weekday       8.9      11.8\n4 Q2      Weekend       3.3       5.5\n5 Q3      Weekday       8.8      10.6\n6 Q3      Weekend       5.2       6.6\n7 Q4      Weekday       7.3       9.1\n8 Q4      Weekend       4.4       5.3\n\n\nThe following section demonstrates other grouping/summarizing operations on a larger dataset."
   },
   {
     "objectID": "group_by.html#a-working-example",

diff --git a/group_by.qmd b/group_by.qmd
@@ -30,14 +30,14 @@ The goal will be to summarize the table by `Weekday` as shown in the following g
 
 The data table has three variables: `Weekday`, `Quarter` and `Delay`. `Delay` is the value we will summarize which leaves us with one variable to *collapse*: `Quarter`. In doing so, we will compute the `Delay` statistics for all quarters associated with a unique `Weekday` value.
 
-This workflow requires two operations: a grouping operation using the `group_by` function and a summary operation using the `summarise` function. Here, we'll compute two summary statistics: minimum delay time and maximum delay time.
+This workflow requires two operations: a grouping operation using the `group_by` function and a summary operation using the `summarise`/`summarize` function. Here, we'll compute two summary statistics: minimum delay time and maximum delay time.
 
 ```{r}
 library(dplyr)
 
 df %>% 
   group_by(Weekday) %>% 
-  summarize(min_delay = min(Delay), max_delay = max(Delay))
+  summarise(min_delay = min(Delay), max_delay = max(Delay))
 ```
 
 Note that the weekday follows the chronological order as defined in the `Weekday` factor.
@@ -66,7 +66,7 @@ This time, the data table has four variables. We are wanting to summarize by `Qu
 ```{r}
 df2 %>% 
   group_by(Quarter, Week) %>% 
-  summarize(min_delay = min(Delay), max_delay = max(Delay))
+  summarise(min_delay = min(Delay), max_delay = max(Delay))
 ```
 
 The following section demonstrates other  grouping/summarizing operations on a larger dataset.

diff --git a/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.RData b/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.RData
diff --git a/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.rdb b/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.rdb
diff --git a/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.rdx b/group_by_cache/html/unnamed-chunk-1_9fb98a5adc80a23c31eedbe88b93a06a.rdx
diff --git a/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.RData b/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.RData
diff --git a/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdb b/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdb
diff --git a/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdx b/group_by_cache/html/unnamed-chunk-2_cd2338c52672295a1db060b6e3202081.rdx
diff --git a/group_by_cache/html/unnamed-chunk-4_d81779338be38082a9993debd3a81a92.RData b/group_by_cache/html/unnamed-chunk-4_d81779338be38082a9993debd3a81a92.RData
diff --git a/group_by_cache/html/unnamed-chunk-4_db15eeb225ba52d8d0d6c4ebaf271c32.RData b/group_by_cache/html/unnamed-chunk-4_db15eeb225ba52d8d0d6c4ebaf271c32.RData
diff --git a/...nk-4_d81779338be38082a9993debd3a81a92.rdb → ...nk-4_db15eeb225ba52d8d0d6c4ebaf271c32.rdb b/...nk-4_d81779338be38082a9993debd3a81a92.rdb → ...nk-4_db15eeb225ba52d8d0d6c4ebaf271c32.rdb
diff --git a/...nk-4_d81779338be38082a9993debd3a81a92.rdx → ...nk-4_db15eeb225ba52d8d0d6c4ebaf271c32.rdx b/...nk-4_d81779338be38082a9993debd3a81a92.rdx → ...nk-4_db15eeb225ba52d8d0d6c4ebaf271c32.rdx
diff --git a/group_by_cache/html/unnamed-chunk-6_6de698b8961425ae8b983a1493c2e376.RData b/group_by_cache/html/unnamed-chunk-6_6de698b8961425ae8b983a1493c2e376.RData
diff --git a/...nk-6_f31d24f5b42ab4b84a1c2090a1293f36.rdb → ...nk-6_6de698b8961425ae8b983a1493c2e376.rdb b/...nk-6_f31d24f5b42ab4b84a1c2090a1293f36.rdb → ...nk-6_6de698b8961425ae8b983a1493c2e376.rdb
diff --git a/...nk-6_f31d24f5b42ab4b84a1c2090a1293f36.rdx → ...nk-6_6de698b8961425ae8b983a1493c2e376.rdx b/...nk-6_f31d24f5b42ab4b84a1c2090a1293f36.rdx → ...nk-6_6de698b8961425ae8b983a1493c2e376.rdx
diff --git a/group_by_cache/html/unnamed-chunk-6_f31d24f5b42ab4b84a1c2090a1293f36.RData b/group_by_cache/html/unnamed-chunk-6_f31d24f5b42ab4b84a1c2090a1293f36.RData