Skip to content

Commit

Permalink
Create automated build
Browse files Browse the repository at this point in the history
  • Loading branch information
RRC_GHA committed Sep 14, 2023
1 parent c47cc1a commit ff91ddf
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 20 deletions.
10 changes: 5 additions & 5 deletions public/2023-09-ucsb-faculty/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -550,14 +550,14 @@
"href": "session_10.html#data-cleaning-basics",
"title": "10  Cleaning & Wrangling Data",
"section": "10.2 Data cleaning basics",
"text": "10.2 Data cleaning basics\nTo demonstrate, we’ll be working with a tidied up version of a data set from Alaska Department of Fish & Game containing commercial catch data from 1878-1997. The data set and reference to the original source can be found at its public archive.\n\n\n\n\n\n\nSetup\n\n\n\nFirst, open a new Quarto document. Delete everything below the setup chunk, and add a library chunk that calls dplyr, tidyr, and readr\n\nlibrary(dplyr)\nlibrary(tidyr)\nlibrary(readr)\n\n\n\n\n\n\n\n\n\nA note on loading packages\n\n\n\nYou may have noticed the following warning messages pop up when you ran your library chunk.\nAttaching package: ‘dplyr’\n\nThe following objects are masked from ‘package:stats’:\n\n filter, lag\n\nThe following objects are masked from ‘package:base’:\n\n intersect, setdiff, setequal, union\nThese are important warnings. They are letting you know that certain functions from the stats and base packages (which are loaded by default when you start R) are masked by different functions with the same name in the dplyr package. It turns out, the order that you load the packages in matters. Since we loaded dplyr after stats, R will assume that if you call filter(), you mean the dplyr version unless you specify otherwise.\nBeing specific about which version of filter(), for example, you call is easy. To explicitly call a function by its unambiguous name, we use the syntax package_name::function_name(...). So, if we wanted to call the stats version of filter() in this Rmarkdown document, I would use the syntax stats::filter(...).\n\n\n\n\n\n\n\n\nNote\n\n\n\nWarnings are important, but we might not want them in our final document. After you have read the packages in, adjust the chunk settings in your library chunk to suppress warnings and messages by adding #| warning: false.\n\n\nNow that we have introduced some data wrangling libraries, let’s get the data that we are going to use for this lesson.\n\n\n\n\n\n\nSetup\n\n\n\n\nGo to KNB Data Package Alaska commercial salmon catches by management region (1886- 1997)\nFind the data file byerlySalmonByRegion.csv. Right click the “Download” button and select “Copy Link Address”\nPaste the copied URL into the read_csv() function\n\nThe code chunk you use to read in the data should look something like this:\n\ncatch_original <- read_csv(\"https://knb.ecoinformatics.org/knb/d1/mn/v2/object/df35b.302.1\")\n\nNote for Windows users: Keep in mind, if you want to replicate this workflow in your local computer you also need to use the url() function here with the argument method = \"libcurl\".\nIt would look like this:\n\ncatch_original <- read.csv(url(\"https://knb.ecoinformatics.org/knb/d1/mn/v2/object/df35b.302.1\", method = \"libcurl\"))\n\n\n\nThis data set is relatively clean and easy to interpret as-is. While it may be clean, it’s in a shape that makes it hard to use for some types of analyses so we’ll want to fix that first.\n\n\n\n\n\n\nExercise\n\n\n\nBefore we get too much further, spend a minute or two outlining your RMarkdown document so that it includes the following sections and steps:\n\nData Sources\n\nRead in the data\nExplore data\n\nClean and Reshape data\n\nRemove unnecessary columns\nCheck column typing\nReshape data"
"text": "10.2 Data cleaning basics\nTo demonstrate, we’ll be working with a tidied up version of a data set from Alaska Department of Fish & Game containing commercial catch data from 1878-1997. The data set and reference to the original source can be found at its public archive.\n\n\n\n\n\n\nSetup\n\n\n\nFirst, open a new Quarto document. Delete everything below the setup chunk, and add a library chunk that calls dplyr, tidyr, and readr\n\nlibrary(dplyr)\nlibrary(tidyr)\nlibrary(readr)\n\n\n\n\n\n\n\n\n\nA note on loading packages\n\n\n\nYou may have noticed the following messages pop up when you ran your library chunk.\nAttaching package: ‘dplyr’\n\nThe following objects are masked from ‘package:stats’:\n\n filter, lag\n\nThe following objects are masked from ‘package:base’:\n\n intersect, setdiff, setequal, union\nThese are important messages. They are letting you know that certain functions from the stats and base packages (which are loaded by default when you start R) are masked by different functions with the same name in the dplyr package. It turns out, the order that you load the packages in matters. Since we loaded dplyr after stats, R will assume that if you call filter(), you mean the dplyr version unless you specify otherwise.\nBeing specific about which version of filter(), for example, you call is easy. To explicitly call a function by its unambiguous name, we use the syntax package_name::function_name(...). So, if we wanted to call the stats version of filter() in this Rmarkdown document, I would use the syntax stats::filter(...).\n\n\n\n\n\n\n\n\nNote\n\n\n\nMessages and warnings are important, but we might not want them in our final document. After you have read the packages in, adjust the chunk settings in your library chunk to suppress warnings and messages by adding #| message: false or #| warning: false. Both of these chunk options, when set to false, prevents messages or warnings from appearing in the rendered file.\n\n\nNow that we have introduced some data wrangling libraries, let’s get the data that we are going to use for this lesson.\n\n\n\n\n\n\nSetup\n\n\n\n\nGo to KNB Data Package Alaska commercial salmon catches by management region (1886- 1997)\nFind the data file byerlySalmonByRegion.csv. Right click the “Download” button and select “Copy Link Address”\nPaste the copied URL into the read_csv() function\n\nThe code chunk you use to read in the data should look something like this:\n\ncatch_original <- read_csv(\"https://knb.ecoinformatics.org/knb/d1/mn/v2/object/df35b.302.1\")\n\nNote for Windows users: Keep in mind, if you want to replicate this workflow in your local computer you also need to use the url() function here with the argument method = \"libcurl\".\nIt would look like this:\n\ncatch_original <- read.csv(url(\"https://knb.ecoinformatics.org/knb/d1/mn/v2/object/df35b.302.1\", method = \"libcurl\"))\n\n\n\nThis data set is relatively clean and easy to interpret as-is. While it may be clean, it’s in a shape that makes it hard to use for some types of analyses so we’ll want to fix that first.\n\n\n\n\n\n\nExercise\n\n\n\nBefore we get too much further, spend a minute or two outlining your RMarkdown document so that it includes the following sections and steps:\n\nData Sources\n\nRead in the data\nExplore data\n\nClean and Reshape data\n\nRemove unnecessary columns\nCheck column typing\nReshape data"
},
{
"objectID": "session_10.html#data-exploration",
"href": "session_10.html#data-exploration",
"title": "10  Cleaning & Wrangling Data",
"section": "10.3 Data exploration",
"text": "10.3 Data exploration\nSimilar to what we did in our Intro to Quarto lesson, it is good practice to skim through the data you just read in. Doing so is important to make sure the data is read as you were expecting and to familiarize yourself with the data.\nSome of the basic ways to explore your data are:\n\n## Prints the column names of my data frame\ncolnames(catch_original)\n\n## First 6 lines of the data frame\nhead(catch_original)\n\n## Summary of each column of data\nsummary(catch_original)\n\n## Prints unique values in a column (in this case, the region)\nunique(catch_original$Region)\n\n## Opens data frame in its own tab to see each row and column of the data\nView(catch_original)"
"text": "10.3 Data exploration\nSimilar to what we did in our Intro to Literate Analysis lesson, it is good practice to skim through the data you just read in. Doing so is important to make sure the data is read as you were expecting and to familiarize yourself with the data.\nSome of the basic ways to explore your data are:\n\n## Prints the column names of my data frame\ncolnames(catch_original)\n\n## First 6 lines of the data frame\nhead(catch_original)\n\n## Summary of each column of data\nsummary(catch_original)\n\n## Prints unique values in a column (in this case, the region)\nunique(catch_original$Region)\n\n## Opens data frame in its own tab to see each row and column of the data\nView(catch_original)"
},
{
"objectID": "session_10.html#about-the-pipe-operator",
Expand Down Expand Up @@ -627,21 +627,21 @@
"href": "session_10.html#sorting-your-data-using-arrange",
"title": "10  Cleaning & Wrangling Data",
"section": "10.13 Sorting your data using arrange()",
"text": "10.13 Sorting your data using arrange()\nThe arrange() function is used to sort the rows of a data.frame. Two common case to use arrange() are:\n\nTo calculate a cumulative sum (with cumsum()) so row order matters\nTo display a table (like in an .Rmd document) in sorted order\n\nLet’s re-calculate mean catch by region, and then arrange() the output by mean catch:\n\nmean_region <- catch_long %>%\n group_by(Region) %>%\n summarize(mean_catch = mean(catch)) %>%\n arrange(mean_catch)\n\nhead(mean_region)\n\n# A tibble: 6 × 2\n Region mean_catch\n <chr> <dbl>\n1 BER 16373.\n2 KTZ 18836.\n3 ALU 40384.\n4 NRS 51503.\n5 KSK 67642.\n6 YUK 68646.\n\n\nThe default sorting order of arrange() is to sort in ascending order. To reverse the sort order, wrap the column name inside the desc() function:\n\nmean_region <- catch_long %>%\n group_by(Region) %>%\n summarize(mean_catch = mean(catch)) %>%\n arrange(desc(mean_catch))\n\nhead(mean_region)\n\n# A tibble: 6 × 2\n Region mean_catch\n <chr> <dbl>\n1 SSE 3184661.\n2 BRB 2709796.\n3 NSE 1825021.\n4 KOD 1528350 \n5 PWS 1419237.\n6 SOP 1110942."
"text": "10.13 Sorting your data using arrange()\nThe arrange() function is used to sort the rows of a data.frame. Two common cases to use arrange() are:\n\nTo calculate a cumulative sum (with cumsum()) so row order matters\nTo display a table (like in an .qmd document) in sorted order\n\nLet’s re-calculate mean catch by region, and then arrange() the output by mean catch:\n\nmean_region <- catch_long %>%\n group_by(Region) %>%\n summarize(mean_catch = mean(catch)) %>%\n arrange(mean_catch)\n\nhead(mean_region)\n\n# A tibble: 6 × 2\n Region mean_catch\n <chr> <dbl>\n1 BER 16373.\n2 KTZ 18836.\n3 ALU 40384.\n4 NRS 51503.\n5 KSK 67642.\n6 YUK 68646.\n\n\nThe default sorting order of arrange() is to sort in ascending order. To reverse the sort order, wrap the column name inside the desc() function:\n\nmean_region <- catch_long %>%\n group_by(Region) %>%\n summarize(mean_catch = mean(catch)) %>%\n arrange(desc(mean_catch))\n\nhead(mean_region)\n\n# A tibble: 6 × 2\n Region mean_catch\n <chr> <dbl>\n1 SSE 3184661.\n2 BRB 2709796.\n3 NSE 1825021.\n4 KOD 1528350 \n5 PWS 1419237.\n6 SOP 1110942."
},
{
"objectID": "session_10.html#splitting-a-column-using-separate-and-unite",
"href": "session_10.html#splitting-a-column-using-separate-and-unite",
"title": "10  Cleaning & Wrangling Data",
"section": "10.14 Splitting a column using separate() and unite()",
"text": "10.14 Splitting a column using separate() and unite()\nThe separate() function allow us to easily split a single column into numerous. Its complement, the unite() function, allows ys to combine multiple columns into a single one.\nThis can come in really handy when we need to split a column into two pieces by a consistent separator (like a dash).\nLet’s make a new data.frame with fake data to illustrate this. Here we have a set of site identification codes with information about the island where the site is (the first 3 letters) and a site number (the 3 numbers). If we want to group and summarize by island, we need a column with just the island information.\n\nsites_df <- data.frame(site = c(\"HAW-101\",\n \"HAW-103\",\n \"OAH-320\",\n \"OAH-219\",\n \"MAI-039\"))\n\nsites_df %>%\n separate(site, c(\"island\", \"site_number\"), \"-\")\n\n island site_number\n1 HAW 101\n2 HAW 103\n3 OAH 320\n4 OAH 219\n5 MAI 039\n\n\n\n\n\n\n\n\nExercise\n\n\n\nSplit the city column in the data frame cities_df into city and state_code columns\n\n## create `cities_df`\ncities_df <- data.frame(city = c(\"Juneau AK\",\n \"Sitka AK\",\n \"Anchorage AK\"))\n\n\n\n\n\nAnswer\ncolnames(cities_df)\n\ncities_clean <- cities_df %>%\n separate(city, c(\"city\", \"state_code\"), \" \")\n\n\nThe unite() function does just the reverse of separate(). If we have a data.frame that contains columns for year, month, and day, we might want to unite these into a single date column.\n\ndates_df <- data.frame(\n year = c(\"1930\",\n \"1930\",\n \"1930\"),\n month = c(\"12\",\n \"12\",\n \"12\"),\n day = c(\"14\",\n \"15\",\n \"16\")\n)\n\ndates_df %>%\n unite(date, year, month, day, sep = \"-\")\n\n date\n1 1930-12-14\n2 1930-12-15\n3 1930-12-16"
"text": "10.14 Splitting a column using separate() and unite()\nThe separate() function allow us to easily split a single column into numerous. Its complement, the unite() function, allows us to combine multiple columns into a single one.\nThis can come in really handy when we need to split a column into two pieces by a consistent separator (like a dash).\nLet’s make a new data.frame with fake data to illustrate this. Here we have a set of site identification codes with information about the island where the site is (the first 3 letters) and a site number (the 3 numbers). If we want to group and summarize by island, we need a column with just the island information.\n\nsites_df <- data.frame(site = c(\"HAW-101\",\n \"HAW-103\",\n \"OAH-320\",\n \"OAH-219\",\n \"MAU-039\"))\n\nsites_df %>%\n separate(site, c(\"island\", \"site_number\"), \"-\")\n\n island site_number\n1 HAW 101\n2 HAW 103\n3 OAH 320\n4 OAH 219\n5 MAU 039\n\n\n\n\n\n\n\n\nExercise\n\n\n\nSplit the city column in the data frame cities_df into city and state_code columns\n\n## create `cities_df`\ncities_df <- data.frame(city = c(\"Juneau AK\",\n \"Sitka AK\",\n \"Anchorage AK\"))\n\n\n\n\n\nAnswer\ncolnames(cities_df)\n\ncities_clean <- cities_df %>%\n separate(city, c(\"city\", \"state_code\"), \" \")\n\n\nThe unite() function does just the reverse of separate(). If we have a data.frame that contains columns for year, month, and day, we might want to unite these into a single date column.\n\ndates_df <- data.frame(\n year = c(\"1930\",\n \"1930\",\n \"1930\"),\n month = c(\"12\",\n \"12\",\n \"12\"),\n day = c(\"14\",\n \"15\",\n \"16\")\n)\n\ndates_df %>%\n unite(date, year, month, day, sep = \"-\")\n\n date\n1 1930-12-14\n2 1930-12-15\n3 1930-12-16"
},
{
"objectID": "session_10.html#now-all-together",
"href": "session_10.html#now-all-together",
"title": "10  Cleaning & Wrangling Data",
"section": "10.15 Now, all together!",
"text": "10.15 Now, all together!\nWe just ran through the various things we can do with dplyr and tidyr but if you’re wondering how this might look in a real analysis. Let’s look at that now:\n\ncatch_original <- read_csv(url(\"https://knb.ecoinformatics.org/knb/d1/mn/v2/object/df35b.302.1\", \n method = \"libcurl\"))\n\nmean_region <- catch_original %>%\n select(-All, -notesRegCode) %>% \n mutate(Chinook = ifelse(Chinook == \"I\", 1, Chinook)) %>% \n mutate(Chinook = as.numeric(Chinook)) %>% \n pivot_longer(-c(Region, Year), \n names_to = \"species\", \n values_to = \"catch\") %>%\n mutate(catch = catch*1000) %>% \n group_by(Region) %>% \n summarize(mean_catch = mean(catch)) %>% \n arrange(desc(mean_catch))\n\nhead(mean_region)\n\n# A tibble: 6 × 2\n Region mean_catch\n <chr> <dbl>\n1 SSE 3184661.\n2 BRB 2709796.\n3 NSE 1825021.\n4 KOD 1528350 \n5 PWS 1419237.\n6 SOP 1110942.\n\n\nWe have completed our lesson on Cleaning and Wrangling data. Before we break, let’s practice our Github workflow.\n\n\n\n\n\n\nSteps\n\n\n\n\nSave the .Rmd you have been working on for this lesson.\nKnit the R Markdown file. This is a way to test everything in your code is working.\nStage > Commit > Pull > Push"
"text": "10.15 Now, all together!\nWe just ran through the various things we can do with dplyr and tidyr but if you’re wondering how this might look in a real analysis. Let’s look at that now:\n\ncatch_original <- read_csv(url(\"https://knb.ecoinformatics.org/knb/d1/mn/v2/object/df35b.302.1\", \n method = \"libcurl\"))\n\nmean_region <- catch_original %>%\n select(-All, -notesRegCode) %>% \n mutate(Chinook = ifelse(Chinook == \"I\", 1, Chinook)) %>% \n mutate(Chinook = as.numeric(Chinook)) %>% \n pivot_longer(-c(Region, Year), \n names_to = \"species\", \n values_to = \"catch\") %>%\n mutate(catch = catch*1000) %>% \n group_by(Region) %>% \n summarize(mean_catch = mean(catch)) %>% \n arrange(desc(mean_catch))\n\nhead(mean_region)\n\n# A tibble: 6 × 2\n Region mean_catch\n <chr> <dbl>\n1 SSE 3184661.\n2 BRB 2709796.\n3 NSE 1825021.\n4 KOD 1528350 \n5 PWS 1419237.\n6 SOP 1110942.\n\n\nWe have completed our lesson on Cleaning and Wrangling data. Before we break, let’s practice our Git workflow.\n\n\n\n\n\n\nSteps\n\n\n\n\nSave the .qmd you have been working on for this lesson.\nRender the Quarto file. This is a way to test everything in your code is working.\nStage > Commit > Pull > Push"
},
{
"objectID": "session_11.html#learning-objectives",
Expand Down
Loading

0 comments on commit ff91ddf

Please sign in to comment.