Skip to content

Commit

Permalink
sticking to summarise() syntax
Browse files Browse the repository at this point in the history
  • Loading branch information
mgimond committed Feb 23, 2024
1 parent b623ed0 commit 77a7de3
Show file tree
Hide file tree
Showing 17 changed files with 10 additions and 10 deletions.
12 changes: 6 additions & 6 deletions docs/group_by.html
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,7 @@
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./robustness.html" class="sidebar-item-text sidebar-link">
<a href="./case_study.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">23</span>&nbsp; <span class="chapter-title">A working example: t-tests and re-expression</span></span></a>
</div>
</li>
Expand Down Expand Up @@ -512,13 +512,13 @@ <h2 data-number="10.1" class="anchored" data-anchor-id="summarizing-data-by-grou
<p>The goal will be to summarize the table by <code>Weekday</code> as shown in the following graphic.</p>
<p><img src="img/Summarize_by_one_variable.png" style="width: 73%; height: auto;"></p>
<p>The data table has three variables: <code>Weekday</code>, <code>Quarter</code> and <code>Delay</code>. <code>Delay</code> is the value we will summarize which leaves us with one variable to <em>collapse</em>: <code>Quarter</code>. In doing so, we will compute the <code>Delay</code> statistics for all quarters associated with a unique <code>Weekday</code> value.</p>
<p>This workflow requires two operations: a grouping operation using the <code>group_by</code> function and a summary operation using the <code>summarise</code> function. Here, we’ll compute two summary statistics: minimum delay time and maximum delay time.</p>
<div class="cell" data-hash="group_by_cache/html/unnamed-chunk-4_d81779338be38082a9993debd3a81a92">
<p>This workflow requires two operations: a grouping operation using the <code>group_by</code> function and a summary operation using the <code>summarise</code>/<code>summarize</code> function. Here, we’ll compute two summary statistics: minimum delay time and maximum delay time.</p>
<div class="cell" data-hash="group_by_cache/html/unnamed-chunk-4_db15eeb225ba52d8d0d6c4ebaf271c32">
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(dplyr)</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>df <span class="sc">%&gt;%</span> </span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">group_by</span>(Weekday) <span class="sc">%&gt;%</span> </span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">summarize</span>(<span class="at">min_delay =</span> <span class="fu">min</span>(Delay), <span class="at">max_delay =</span> <span class="fu">max</span>(Delay))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">summarise</span>(<span class="at">min_delay =</span> <span class="fu">min</span>(Delay), <span class="at">max_delay =</span> <span class="fu">max</span>(Delay))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5 × 3
Weekday min_delay max_delay
Expand Down Expand Up @@ -546,10 +546,10 @@ <h3 data-number="10.1.1" class="anchored" data-anchor-id="grouping-by-multiple-v
<p>The goal will be to summarize the delay time by <code>Quarter</code> and by <code>Week</code> type as shown in the following graphic.</p>
<p><img src="img/Summarize_by_two_variable.png" style="width: 73%; height: auto;"></p>
<p>This time, the data table has four variables. We are wanting to summarize by <code>Quater</code> and <code>Week</code> which leaves one variable, <code>Direction</code>, that needs to be collapsed.</p>
<div class="cell" data-hash="group_by_cache/html/unnamed-chunk-6_f31d24f5b42ab4b84a1c2090a1293f36">
<div class="cell" data-hash="group_by_cache/html/unnamed-chunk-6_6de698b8961425ae8b983a1493c2e376">
<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>df2 <span class="sc">%&gt;%</span> </span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">group_by</span>(Quarter, Week) <span class="sc">%&gt;%</span> </span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">summarize</span>(<span class="at">min_delay =</span> <span class="fu">min</span>(Delay), <span class="at">max_delay =</span> <span class="fu">max</span>(Delay))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">summarise</span>(<span class="at">min_delay =</span> <span class="fu">min</span>(Delay), <span class="at">max_delay =</span> <span class="fu">max</span>(Delay))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 8 × 4
# Groups: Quarter [4]
Expand Down
2 changes: 1 addition & 1 deletion docs/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@
"href": "group_by.html#summarizing-data-by-group",
"title": "10  Grouping and summarizing",
"section": "10.1 Summarizing data by group",
"text": "10.1 Summarizing data by group\nLet’s first create a dataframe listing the average delay time in minutes, by day of the week and by quarter, for Logan airport’s 2014 outbound flights.\n\ndf &lt;- data.frame(\n Weekday = factor(rep(c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\"), each = 4), \n levels = c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\")),\n Quarter = paste0(\"Q\", rep(1:4, each = 5)), \n Delay = c(9.9, 5.4, 8.8, 6.9, 4.9, 9.7, 7.9, 5, 8.8, 11.1, 10.2, 9.3, 12.2,\n 10.2, 9.2, 9.7, 12.2, 8.1, 7.9, 5.6))\n\nThe goal will be to summarize the table by Weekday as shown in the following graphic.\n\nThe data table has three variables: Weekday, Quarter and Delay. Delay is the value we will summarize which leaves us with one variable to collapse: Quarter. In doing so, we will compute the Delay statistics for all quarters associated with a unique Weekday value.\nThis workflow requires two operations: a grouping operation using the group_by function and a summary operation using the summarise function. Here, we’ll compute two summary statistics: minimum delay time and maximum delay time.\n\nlibrary(dplyr)\n\ndf %&gt;% \n group_by(Weekday) %&gt;% \n summarize(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 5 × 3\n Weekday min_delay max_delay\n &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt;\n1 Mon 5.4 9.9\n2 Tues 4.9 9.7\n3 Wed 8.8 11.1\n4 Thurs 9.2 12.2\n5 Fri 5.6 12.2\n\n\nNote that the weekday follows the chronological order as defined in the Weekday factor.\nYou’ll also note that the output is a tibble. This data class is discussed at the end of this page.\n\n10.1.1 Grouping by multiple variables\nYou can group by more than one variable. For example, let’s build another dataframe listing the average delay time in minutes, by quarter, by weekend/weekday and by inbound/outbound status for Logan airport’s 2014 outbound flights.\n\ndf2 &lt;- data.frame(\n Quarter = paste0(\"Q\", rep(1:4, each = 4)), \n Week = rep(c(\"Weekday\", \"Weekend\"), each=2, times=4),\n Direction = rep(c(\"Inbound\", \"Outbound\"), times=8),\n Delay = c(10.8, 9.7, 15.5, 10.4, 11.8, 8.9, 5.5, \n 3.3, 10.6, 8.8, 6.6, 5.2, 9.1, 7.3, 5.3, 4.4))\n\nThe goal will be to summarize the delay time by Quarter and by Week type as shown in the following graphic.\n\nThis time, the data table has four variables. We are wanting to summarize by Quater and Week which leaves one variable, Direction, that needs to be collapsed.\n\ndf2 %&gt;% \n group_by(Quarter, Week) %&gt;% \n summarize(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 8 × 4\n# Groups: Quarter [4]\n Quarter Week min_delay max_delay\n &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;\n1 Q1 Weekday 9.7 10.8\n2 Q1 Weekend 10.4 15.5\n3 Q2 Weekday 8.9 11.8\n4 Q2 Weekend 3.3 5.5\n5 Q3 Weekday 8.8 10.6\n6 Q3 Weekend 5.2 6.6\n7 Q4 Weekday 7.3 9.1\n8 Q4 Weekend 4.4 5.3\n\n\nThe following section demonstrates other grouping/summarizing operations on a larger dataset."
"text": "10.1 Summarizing data by group\nLet’s first create a dataframe listing the average delay time in minutes, by day of the week and by quarter, for Logan airport’s 2014 outbound flights.\n\ndf &lt;- data.frame(\n Weekday = factor(rep(c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\"), each = 4), \n levels = c(\"Mon\", \"Tues\", \"Wed\", \"Thurs\", \"Fri\")),\n Quarter = paste0(\"Q\", rep(1:4, each = 5)), \n Delay = c(9.9, 5.4, 8.8, 6.9, 4.9, 9.7, 7.9, 5, 8.8, 11.1, 10.2, 9.3, 12.2,\n 10.2, 9.2, 9.7, 12.2, 8.1, 7.9, 5.6))\n\nThe goal will be to summarize the table by Weekday as shown in the following graphic.\n\nThe data table has three variables: Weekday, Quarter and Delay. Delay is the value we will summarize which leaves us with one variable to collapse: Quarter. In doing so, we will compute the Delay statistics for all quarters associated with a unique Weekday value.\nThis workflow requires two operations: a grouping operation using the group_by function and a summary operation using the summarise/summarize function. Here, we’ll compute two summary statistics: minimum delay time and maximum delay time.\n\nlibrary(dplyr)\n\ndf %&gt;% \n group_by(Weekday) %&gt;% \n summarise(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 5 × 3\n Weekday min_delay max_delay\n &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt;\n1 Mon 5.4 9.9\n2 Tues 4.9 9.7\n3 Wed 8.8 11.1\n4 Thurs 9.2 12.2\n5 Fri 5.6 12.2\n\n\nNote that the weekday follows the chronological order as defined in the Weekday factor.\nYou’ll also note that the output is a tibble. This data class is discussed at the end of this page.\n\n10.1.1 Grouping by multiple variables\nYou can group by more than one variable. For example, let’s build another dataframe listing the average delay time in minutes, by quarter, by weekend/weekday and by inbound/outbound status for Logan airport’s 2014 outbound flights.\n\ndf2 &lt;- data.frame(\n Quarter = paste0(\"Q\", rep(1:4, each = 4)), \n Week = rep(c(\"Weekday\", \"Weekend\"), each=2, times=4),\n Direction = rep(c(\"Inbound\", \"Outbound\"), times=8),\n Delay = c(10.8, 9.7, 15.5, 10.4, 11.8, 8.9, 5.5, \n 3.3, 10.6, 8.8, 6.6, 5.2, 9.1, 7.3, 5.3, 4.4))\n\nThe goal will be to summarize the delay time by Quarter and by Week type as shown in the following graphic.\n\nThis time, the data table has four variables. We are wanting to summarize by Quater and Week which leaves one variable, Direction, that needs to be collapsed.\n\ndf2 %&gt;% \n group_by(Quarter, Week) %&gt;% \n summarise(min_delay = min(Delay), max_delay = max(Delay))\n\n# A tibble: 8 × 4\n# Groups: Quarter [4]\n Quarter Week min_delay max_delay\n &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;\n1 Q1 Weekday 9.7 10.8\n2 Q1 Weekend 10.4 15.5\n3 Q2 Weekday 8.9 11.8\n4 Q2 Weekend 3.3 5.5\n5 Q3 Weekday 8.8 10.6\n6 Q3 Weekend 5.2 6.6\n7 Q4 Weekday 7.3 9.1\n8 Q4 Weekend 4.4 5.3\n\n\nThe following section demonstrates other grouping/summarizing operations on a larger dataset."
},
{
"objectID": "group_by.html#a-working-example",
Expand Down
6 changes: 3 additions & 3 deletions group_by.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,14 @@ The goal will be to summarize the table by `Weekday` as shown in the following g

The data table has three variables: `Weekday`, `Quarter` and `Delay`. `Delay` is the value we will summarize which leaves us with one variable to *collapse*: `Quarter`. In doing so, we will compute the `Delay` statistics for all quarters associated with a unique `Weekday` value.

This workflow requires two operations: a grouping operation using the `group_by` function and a summary operation using the `summarise` function. Here, we'll compute two summary statistics: minimum delay time and maximum delay time.
This workflow requires two operations: a grouping operation using the `group_by` function and a summary operation using the `summarise`/`summarize` function. Here, we'll compute two summary statistics: minimum delay time and maximum delay time.

```{r}
library(dplyr)
df %>%
group_by(Weekday) %>%
summarize(min_delay = min(Delay), max_delay = max(Delay))
summarise(min_delay = min(Delay), max_delay = max(Delay))
```

Note that the weekday follows the chronological order as defined in the `Weekday` factor.
Expand Down Expand Up @@ -66,7 +66,7 @@ This time, the data table has four variables. We are wanting to summarize by `Qu
```{r}
df2 %>%
group_by(Quarter, Week) %>%
summarize(min_delay = min(Delay), max_delay = max(Delay))
summarise(min_delay = min(Delay), max_delay = max(Delay))
```

The following section demonstrates other grouping/summarizing operations on a larger dataset.
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

0 comments on commit 77a7de3

Please sign in to comment.