diff --git a/public/2023-10-coreR/images/git-repo.png b/public/2023-10-coreR/images/git-repo.png new file mode 100644 index 00000000..e34d0817 Binary files /dev/null and b/public/2023-10-coreR/images/git-repo.png differ diff --git a/public/2023-10-coreR/images/git_commit.png b/public/2023-10-coreR/images/git_commit.png new file mode 100644 index 00000000..d3732b50 Binary files /dev/null and b/public/2023-10-coreR/images/git_commit.png differ diff --git a/public/2023-10-coreR/images/git_status_staged.png b/public/2023-10-coreR/images/git_status_staged.png new file mode 100644 index 00000000..89848a59 Binary files /dev/null and b/public/2023-10-coreR/images/git_status_staged.png differ diff --git a/public/2023-10-coreR/images/git_status_unstaged.png b/public/2023-10-coreR/images/git_status_unstaged.png new file mode 100644 index 00000000..adb6445c Binary files /dev/null and b/public/2023-10-coreR/images/git_status_unstaged.png differ diff --git a/public/2023-10-coreR/images/schedule-coreR-2023-10.png b/public/2023-10-coreR/images/schedule-coreR-2023-10.png new file mode 100644 index 00000000..cd7d208f Binary files /dev/null and b/public/2023-10-coreR/images/schedule-coreR-2023-10.png differ diff --git a/public/2023-10-coreR/images/use_git.png b/public/2023-10-coreR/images/use_git.png new file mode 100644 index 00000000..79eb61e9 Binary files /dev/null and b/public/2023-10-coreR/images/use_git.png differ diff --git a/public/2023-10-coreR/images/use_github.png b/public/2023-10-coreR/images/use_github.png new file mode 100644 index 00000000..73d8fb31 Binary files /dev/null and b/public/2023-10-coreR/images/use_github.png differ diff --git a/public/2023-10-coreR/index.html b/public/2023-10-coreR/index.html index e36f8196..da69f66d 100644 --- a/public/2023-10-coreR/index.html +++ b/public/2023-10-coreR/index.html @@ -143,7 +143,7 @@
These written materials are the result of a continuous and collaborative effort at NCEAS to help researchers make their work more transparent and reproducible. This work began in the early 2000’s, and reflects the expertise and diligence of many, many individuals. The primary authors are listed in the citation below, with additional contributors recognized for their role in developing previous iterations of these or similar materials.
This work is licensed under a Creative Commons Attribution 4.0 International License.
-Citation: Halina Do-Linh, Camila Vargas Poulsen, Samantha Csik, Daphne Virlar-Knight. 2023. coreR Course. NCEAS Learning Hub.
-Additional contributors: Ben Bolker, Amber E. Budden, Julien Brun, Natasha Haycock-Chavez, S. Jeanette Clark, Julie Lowndes, Stephanie Hampton, Matthew B. Jones, Samanta Katz, Erin McLean, Bryce Mecum, Deanna Pennington, Karthik Ram, Jim Regetz, Tracy Teal, Leah Wasser.
+Citation: Halina Do-Linh and Camila Vargas Poulsen (2023). coreR Course. NCEAS Learning Hub. URL learning.nceas.ucsb.edu/2023-10-coreR.
+Additional contributors: Ben Bolker, Amber E. Budden, Julien Brun, Samantha Csik, Natasha Haycock-Chavez, S. Jeanette Clark, Julie Lowndes, Stephanie Hampton, Matthew B. Jones, Samanta Katz, Erin McLean, Bryce Mecum, Deanna Pennington, Karthik Ram, Jim Regetz, Tracy Teal, Daphne Virlar-Knight, Leah Wasser.
This is a Quarto book. To learn more about Quarto books visit https://quarto.org/docs/books.
diff --git a/public/2023-10-coreR/search.json b/public/2023-10-coreR/search.json index ba8a2ac8..2dcedbaa 100644 --- a/public/2023-10-coreR/search.json +++ b/public/2023-10-coreR/search.json @@ -32,7 +32,7 @@ "href": "index.html#about-this-book", "title": "NCEAS Learning Hub’s coreR Course", "section": "About this book", - "text": "About this book\nThese written materials are the result of a continuous and collaborative effort at NCEAS to help researchers make their work more transparent and reproducible. This work began in the early 2000’s, and reflects the expertise and diligence of many, many individuals. The primary authors are listed in the citation below, with additional contributors recognized for their role in developing previous iterations of these or similar materials.\nThis work is licensed under a Creative Commons Attribution 4.0 International License.\nCitation: Halina Do-Linh, Camila Vargas Poulsen, Samantha Csik, Daphne Virlar-Knight. 2023. coreR Course. NCEAS Learning Hub.\nAdditional contributors: Ben Bolker, Amber E. Budden, Julien Brun, Natasha Haycock-Chavez, S. Jeanette Clark, Julie Lowndes, Stephanie Hampton, Matthew B. Jones, Samanta Katz, Erin McLean, Bryce Mecum, Deanna Pennington, Karthik Ram, Jim Regetz, Tracy Teal, Leah Wasser.\nThis is a Quarto book. To learn more about Quarto books visit https://quarto.org/docs/books." + "text": "About this book\nThese written materials are the result of a continuous and collaborative effort at NCEAS to help researchers make their work more transparent and reproducible. This work began in the early 2000’s, and reflects the expertise and diligence of many, many individuals. The primary authors are listed in the citation below, with additional contributors recognized for their role in developing previous iterations of these or similar materials.\nThis work is licensed under a Creative Commons Attribution 4.0 International License.\nCitation: Halina Do-Linh and Camila Vargas Poulsen (2023). coreR Course. NCEAS Learning Hub. URL learning.nceas.ucsb.edu/2023-10-coreR.\nAdditional contributors: Ben Bolker, Amber E. Budden, Julien Brun, Samantha Csik, Natasha Haycock-Chavez, S. Jeanette Clark, Julie Lowndes, Stephanie Hampton, Matthew B. Jones, Samanta Katz, Erin McLean, Bryce Mecum, Deanna Pennington, Karthik Ram, Jim Regetz, Tracy Teal, Daphne Virlar-Knight, Leah Wasser.\nThis is a Quarto book. To learn more about Quarto books visit https://quarto.org/docs/books." }, { "objectID": "session_01.html#learning-objectives", @@ -333,35 +333,35 @@ "href": "session_06.html#introduction-to-version-control", "title": "6 Intro to Git and GitHub", "section": "6.1 Introduction to Version Control", - "text": "6.1 Introduction to Version Control\n\n\n\n\n\nEvery file in the scientific process changes. Manuscripts are edited. Figures get revised. Code gets fixed when bugs are discovered. Sometimes those fixes lead to even more bugs, leading to more changes in the codebase. Data files get combined together. Sometimes those same files are split and combined again. All that to say - in just one research project, we can expect thousands of changes to occur.\nThese changes are important to track, and yet, we often use simplistic filenames to track them. Many of us have experienced renaming a document or script multiple times with the ingenuine addition of “final” to the filename (like the comic above demonstrates).\nYou might think there is a better way, and you’d be right: version control. Version control provides an organized and transparent way to track changes in code and additional files. This practice was designed for software development, but is easily applicable to scientific programming.\nThere are many benefits to using a version control software including:\n\nMaintain a history of your research project’s development while keeping your workspace clean\nFacilitate collaboration and transparency when working on teams\nExplore bugs or new features without disrupting your team members’ work\nand more!\n\nThe version control system we’ll be diving into is Git, the most widely used modern version control system in the world." + "text": "6.1 Introduction to Version Control\n\n\n\n\n\nEvery file in the scientific process changes. Manuscripts are edited. Figures get revised. Code gets fixed when bugs are discovered. Sometimes those fixes lead to even more bugs, leading to more changes in the code base. Data files get combined together. Sometimes those same files are split and combined again. In just one research project, we can expect thousands of changes to occur.\nThese changes are important to track, and yet, we often use simplistic file names to do so. Many of us have experienced renaming a document or script multiple times with the ingenuine addition of “final” to the file name (like the comic above demonstrates).\nYou might think there is a better way, and you’d be right: version control. Version control provides an organized and transparent way to track changes in code and additional files. This practice was designed for software development, but is easily applicable to scientific programming.\nThere are many benefits to using a version control software including:\n\nMaintain a history of your research project’s development while keeping your workspace clean\nFacilitate collaboration and transparency when working on teams\nExplore bugs or new features without disrupting your team members’ work\nand more!\n\nThe version control system we’ll be diving into is Git, the most widely used modern version control system in the world." }, { "objectID": "session_06.html#introduction-to-git-github", "href": "session_06.html#introduction-to-git-github", "title": "6 Intro to Git and GitHub", "section": "6.2 Introduction to Git + GitHub", - "text": "6.2 Introduction to Git + GitHub\nBefore diving into the details of Git and how to use it, let’s start with a motivating example that’s representative of the types of problems Git can help us solve.\n\n6.2.1 A Motivating Example\nSay, for example, you’re working on an analysis in R and you’ve got it into a state you’re pretty happy with. We’ll call this version 1:\n\n\n\nYou come into the office the following day and you have an email from your boss, “Hey, you know what this model needs?”\n\n\n\nYou’re not entirely sure what she means but you figure there’s only one thing she could be talking about: more cowbell. So you add it to the model in order to really explore the space.\nBut you’re worried about losing track of the old model so, instead of editing the code in place, you comment out the old code and put as serious a warning as you can muster in a comment above it.\n\n\n\nCommenting out code you don’t want to lose is something probably all of us have done at one point or another but it’s really hard to understand why you did this when you come back years later or you when you send your script to a colleague. Luckily, there’s a better way: Version control. Instead of commenting out the old code, we can change the code in place and tell Git to commit our change. So now we have two distinct versions of our analysis and we can always see what the previous version(s) look like.\n\n\n\nYou may have noticed something else in the diagram above: Not only can we save a new version of our analysis, we can also write as much text as we like about the change in the commit message. In addition to the commit message, Git also tracks who, when, and where the change was made.\nImagine that some time has gone by and you’ve committed a third version of your analysis, version 3, and a colleague emails with an idea: What if you used machine learning instead?\n\n\n\nMaybe you’re not so sure the idea will work out and this is where a tool like Git shines. Without a tool like Git, we might copy analysis.R to another file called analysis-ml.R which might end up having mostly the same code except for a few lines. This isn’t particularly problematic until you want to make a change to a bit of shared code and now you have to make changes in two files, if you even remember to.\nInstead, with Git, we can start a branch. Branches allow us to confidently experiment on our code, all while leaving the old code in tact and recoverable.\n\n\n\nSo you’ve been working in a branch and have made a few commits on it and your boss emails again asking you to update the model in some way. If you weren’t using a tool like Git, you might panic at this point because you’ve rewritten much of your analysis to use a different method but your boss wants change to the old method.\n\n\n\nBut with Git and branches, we can continue developing our main analysis at the same time as we are working on any experimental branches. Branches are great for experiments but also great for organizing your work generally.\n\n\n\nAfter all that hard work on the machine learning experiment, you and your colleague could decide to scrap it. It’s perfectly fine to leave branches around and switch back to the main line of development but we can also delete them to tidy up.\n\n\n\nIf, instead, you and your colleague had decided you liked the machine learning experiment, you could also merge the branch with your main development line. Merging branches is analogous to accepting a change in Word’s Track Changes feature but way more powerful and useful.\n\n\n\nA key takeaway here is that Git can drastically increase your confidence and willingness to make changes to your code and help you avoid problems down the road. Analysis rarely follows a linear path and we need a tool that respects this.\n\n\n\nFinally, imagine that, years later, your colleague asks you to make sure the model you reported in a paper you published together was actually the one you used. Another really powerful feature of Git is tags which allow us to record a particular state of our analysis with a meaningful name. In this case, we are lucky because we tagged the version of our code we used to run the analysis. Even if we continued to develop beyond commit 5 (above) after we submitted our manuscript, we can always go back and run the analysis as it was in the past.\n\nWith Git we can enhance our workflow:\n\nEliminate the need for cryptic filenames and comments to track our work.\nProvide detailed descriptions of our changes through commits, making it easier to understand the reasons behind code modifications.\nWork on multiple branches simultaneously, allowing for parallel development, and optionally merge them together.\nUse commits to access and even execute older versions of our code.\nAssign meaningful tags to specific versions of our code.\nAdditionally, Git offers a powerful distributed feature. Multiple individuals can work on the same analysis concurrently on their own computers, with the ability to merge everyone’s changes together.\n\n\n\n\n6.2.2 What exactly are Git and GitHub?\n\nGit:\n\nan open-source distributed version control software\ndesigned to manage the versioning and tracking of source code files and project history\noperates locally on your computer, allowing you to create repositories, track changes, and collaborate with others\nprovides features such as committing changes, branching and merging code, reverting to previous versions, and managing project history\nworks directly with the files on your computer and does not require a network connection to perform most operations\nprimarily used through the command-line interface (CLI, e.g. Terminal), but also has various GUI tools available (e.g. RStudio IDE)\n\n\n\n\n\n\nGitHub:\n\nonline platform and service built around Git\nprovides a centralized hosting platform for Git repositories\nallows us to store, manage, and collaborate on their Git repositories in the cloud\noffers additional features on top of Git, such as a web-based interface, issue tracking, project management tools, pull requests, code review, and collaboration features\nenables easy sharing of code with others, facilitating collaboration and contribution to open source projects\nprovides a social aspect, allowing users to follow projects, star repositories, and discover new code\n\n\n\n\n\n\n\n6.2.3 The Git Life cycle\nAs a Git user, you’ll need to understand the basic concepts associated with versioned sets of changes, and how they are stored and moved across repositories. Any given Git repository can be cloned so that it exists both locally, and remotely. But each of these cloned repositories is simply a copy of all of the files and change history for those files, stored in Git’s particular format. For our purposes, we can consider a Git repository as a folder with a bunch of additional version-related metadata.\nIn a local Git-enabled folder, the folder contains a workspace containing the current version of all files in the repository. These working files are linked to a hidden folder containing the ‘Local repository’, which contains all of the other changes made to the files, along with the version metadata.\nSo, when working with files using Git, you can use Git commands to indicate specifically which changes to the local working files should be staged for versioning (using the git add command), and when to record those changes as a version in the local repository (using the command git commit).\nThe remaining concepts are involved in synchronizing the changes in your local repository with changes in a remote repository. The git push command is used to send local changes up to a remote repository (possibly on GitHub), and the git pull command is used to fetch changes from a remote repository and merge them into the local repository.\nA basic git workflow represented as two islands, one with “local repo” and “working directory”, and another with “remote repo.” Bunnies move file boxes from the working directory to the staging area, then with Commit move them to the local repo. Bunnies in rowboats move changes from the local repo to the remote repo (labeled “PUSH”) and from the remote repo to the working directory (labeled “PULL”).\n\n\n\n\nArtwork by Allison Horst\n\n\n\n\n6.2.4 Let’s Look at a GitHub Repository\nThis screen shows the copy of a repository stored on GitHub, with its list of files, when the files and directories were last modified, and some information on who made the most recent changes.\n\n\n\nIf we drill into the “commits” for the repository, we can see the history of changes made to all of the files. Looks like kellijohnson was working on the project and fixing errors in December:\n\n\n\nAnd finally, if we drill into one of the changes made on December 20, we can see exactly what was changed in each file:\n\n\n\nTracking these changes, how they relate to released versions of software and files is exactly what Git and GitHub are good for. And we will show how they can really be effective for tracking versions of scientific code, figures, and manuscripts to accomplish a reproducible workflow.\n\n\n6.2.5 Git Vocabulary & Commands\nWe know the world of Git and GitHub can be daunting. Use these tables as references while you use Git and GitHub, and we encourage you to build upon this list as you become more comfortable with these tools.\nThis table contains essential terms and commands that complement intro to Git skills. They will get you far on personal and individual projects.\n\nEssential Git Commands\n\n\n\n\n\n\n\nTerm\nGit Command(s)\nDefinition\n\n\n\n\nAdd\ngit add [file]\nStages or adds file changes to the next commit. git add . will stage or add all files.\n\n\nCommit\ngit commit\nRecords changes to the repository with a descriptive message.\n\n\nCommit Message\ngit commit -m\nA descriptive message explaining the changes made in a commit. The message must be within quotes (e.g. “This is my commit message.”).\n\n\nFetch\ngit fetch\nRetrieves changes from a remote repository but does not merge them.\n\n\nPull\ngit pull\nRetrieves and merges changes from a remote repository to the current branch.\n\n\nPush\ngit push\nSends local commits to a remote repository.\n\n\nStage\n-\nThe process of preparing and selecting changes to be included in the next commit.\n\n\nStatus\ngit status\nShows the current status of the repository, including changes and branch information.\n\n\n\nThis table includes more advanced Git terms and commands that are commonly used in both individual and collaborative projects.\n\nAdvanced Git Commands\n\n\n\n\n\n\n\nTerm\nGit Command(s)\nDefinition\n\n\n\n\nBranch\ngit branch\nLists existing branches or creates a new branch.\n\n\nCheckout\ngit checkout [branch]\nSwitches to a different branch or restores files from a specific commit.\n\n\nClone\ngit clone [repository]\nCreates a local copy of a remote repository.\n\n\nDiff\ngit diff\nShows differences between files, commits, or branches.\n\n\nFork\n-\nCreates a personal copy of a repository under your GitHub account for independent development.\n\n\nLog\ngit log\nDisplays the commit history of the repository.\n\n\nMerge\ngit merge [branch]\nIntegrates changes from one branch into another branch.\n\n\nMerge Conflict\n-\nOccurs when Git cannot automatically merge changes from different branches, requiring manual resolution.\n\n\nPull Request (PR)\n-\nA request to merge changes from a branch into another branch, typically in a collaborative project.\n\n\nRebase\ngit rebase\nIntegrates changes from one branch onto another by modifying commit history.\n\n\nRemote\ngit remote\nManages remote repositories linked to the local repository.\n\n\nRepository\ngit init\nA directory where Git tracks and manages files and their versions.\n\n\nStash\ngit stash\nTemporarily saves changes that are not ready to be committed.\n\n\nTag\ngit tag\nAssigns a label or tag to a specific commit.\n\n\n\nGit has a rich set of commands and features, and there are many more terms beyond either table." + "text": "6.2 Introduction to Git + GitHub\nBefore diving into the details of Git and how to use it, let’s start with a motivating example that’s representative of the types of problems Git can help us solve.\n\n6.2.1 A Motivating Example\nSay, for example, you’re working on an analysis in R and you’ve got it into a state you’re pretty happy with. We’ll call this version 1:\n\n\n\nYou come into the office the following day and you have an email from your boss, “Hey, you know what this model needs?”\n\n\n\nYou’re not entirely sure what she means but you figure there’s only one thing she could be talking about: more cowbell. So you add it to the model in order to really explore the space.\nBut you’re worried about losing track of the old model so, instead of editing the code in place, you comment out the old code and put as serious a warning as you can muster in a comment above it.\n\n\n\nCommenting out code you don’t want to lose is something probably all of us have done at one point or another but it’s really hard to understand why you did this when you come back years later or you when you send your script to a colleague. Luckily, there’s a better way: Version control. Instead of commenting out the old code, we can change the code in place and tell Git to commit our change. So now we have two distinct versions of our analysis and we can always see what the previous version(s) look like.\n\n\n\nYou may have noticed something else in the diagram above: Not only can we save a new version of our analysis, we can also write as much text as we like about the change in the commit message. In addition to the commit message, Git also tracks who, when, and where the change was made.\nImagine that some time has gone by and you’ve committed a third version of your analysis, version 3, and a colleague emails with an idea: What if you used machine learning instead?\n\n\n\nMaybe you’re not so sure the idea will work out and this is where a tool like Git shines. Without a tool like Git, we might copy analysis.R to another file called analysis-ml.R which might end up having mostly the same code except for a few lines. This isn’t particularly problematic until you want to make a change to a bit of shared code and now you have to make changes in two files, if you even remember to.\nInstead, with Git, we can start a branch. Branches allow us to confidently experiment on our code, all while leaving the old code in tact and recoverable.\n\n\n\nSo you’ve been working in a branch and have made a few commits on it and your boss emails again asking you to update the model in some way. If you weren’t using a tool like Git, you might panic at this point because you’ve rewritten much of your analysis to use a different method but your boss wants change to the old method.\n\n\n\nBut with Git and branches, we can continue developing our main analysis at the same time as we are working on any experimental branches. Branches are great for experiments but also great for organizing your work generally.\n\n\n\nAfter all that hard work on the machine learning experiment, you and your colleague could decide to scrap it. It’s perfectly fine to leave branches around and switch back to the main line of development but we can also delete them to tidy up.\n\n\n\nIf, instead, you and your colleague had decided you liked the machine learning experiment, you could also merge the branch with your main development line. Merging branches is analogous to accepting a change in Word’s Track Changes feature but way more powerful and useful.\n\n\n\nA key takeaway here is that Git can drastically increase your confidence and willingness to make changes to your code and help you avoid problems down the road. Analysis rarely follows a linear path and we need a tool that respects this.\n\n\n\nFinally, imagine that, years later, your colleague asks you to make sure the model you reported in a paper you published together was actually the one you used. Another really powerful feature of Git is tags which allow us to record a particular state of our analysis with a meaningful name. In this case, we are lucky because we tagged the version of our code we used to run the analysis. Even if we continued to develop beyond commit 5 (above) after we submitted our manuscript, we can always go back and run the analysis as it was in the past.\n\nWith Git we can enhance our workflow:\n\nEliminate the need for cryptic filenames and comments to track our work.\nProvide detailed descriptions of our changes through commits, making it easier to understand the reasons behind code modifications.\nWork on multiple branches simultaneously, allowing for parallel development, and optionally merge them together.\nUse commits to access and even execute older versions of our code.\nAssign meaningful tags to specific versions of our code.\nAdditionally, Git offers a powerful distributed feature. Multiple individuals can work on the same analysis concurrently on their own computers, with the ability to merge everyone’s changes together.\n\n\n\n\n6.2.2 What exactly are Git and GitHub?\n\nGit:\n\nan open-source distributed version control software\ndesigned to manage the versioning and tracking of source code files and project history\noperates locally on your computer, allowing you to create repositories, and track changes\nprovides features such as committing changes, branching and merging code, reverting to previous versions, and managing project history\nworks directly with the files on your computer and does not require a network connection to perform most operations\nprimarily used through the command-line interface (CLI, e.g. Terminal), but also has various GUI tools available (e.g. RStudio IDE)\n\n\n\n\n\n\nGitHub:\n\nonline platform and service built around Git\nprovides a centralized hosting platform for Git repositories\nallows us to store, manage, and collaborate on their Git repositories in the cloud\noffers additional features on top of Git, such as a web-based interface, issue tracking, project management tools, pull requests, code review, and collaboration features\nenables easy sharing of code with others, facilitating collaboration and contribution to open source projects\nprovides a social aspect, allowing users to follow projects, star repositories, and discover new code\n\n\n\n\n\n\n\n6.2.3 Understanding how local working files, Git, and GitHub all work together\nIt can be a bit daunting to understand all the moving parts of the Git / GitHub life cycle (i.e. how file changes are tracked locally within repositories, then stored for safe-keeping and collaboration on remote repositories, then brought back down to a local machine(s) for continued development). It gets easier with practice, but we’ll explain (first in words, then with an illustration) at a high-level how things work:\n\n6.2.3.1 What is the difference between a “normal” folder vs. a Git repository\nWhether you’re a Mac or a PC user, you’ll likely have created a folder at some point in time for organizing files. Let’s pretend that we create a folder, called myFolder/, and add two files: myData.csv and myAnalysis.R. The contents of this folder are not currently version controlled – meaning, for example, that if we make some changes to myAnalysis.R that don’t quite work out, we have no way of accessing or reverting back to a previous version of myAnalysis.R (without remembering/rewriting things, of course).\nGit allows you to turn any “normal” folder, like myFolder/, into a Git repository – you’ll often see/hear this referenced as “initializing a Git repository”. When you initialize a folder on your local computer as a Git repository, a hidden .git/ folder is created within that folder (e.g. myFolder/.git/) – this .git/ folder is the Git repository. As you use Git commands to capture versions or “snapshots” of your work, those versions (and their associated metadata) get stored within the .git/ folder. This allows you to access and/or recover any previous versions of your work. If you delete .git/, you delete your project’s history.\nHere is our example folder / Git repository represented visually:\n\n\n\n\n\n\n\n\n\n\n\n\n\n6.2.3.2 How do I actually tell Git to preserve versions of my local working files?\nGit was built as a command-line tool, meaning we can use Git commands in the command line (e.g. Terminal, Git Bash, etc.) to take “snapshots” of our local working files through time. Alternatively, RStudio provides buttons that help to easily execute these Git commands.\nGenerally, that workflow looks something like this:\n\nMake changes to a file(s) (e.g. myAnalysis.R) in your working directory.\nStage the file(s) using git add myAnalysis.R (or git add . to stage multiple changed files at once). This lets Git know that you’d like to include the file(s) in your next commit.\nCommit the file(s) using git commit -m \"a message describing my changes\". This records those changes (along with a descriptive message) as a “snapshot” or version in the local repository (i.e. the .git/ folder).\n\n\n\n6.2.3.3 My versioned work is on my local computer, but I want to send it to GitHub. How?\nThe last step is synchronizing the changes made to our local repository with a remote repository (oftentimes, this remote repository is stored on GitHub). The git push command is used to send local commits up to a remote repository. The git pull command is used to fetch changes from a remote repository and merge them into the local repository – pulling will become a regular part of your workflow when collaborating with others, or even when working alone but on different machines (e.g. a laptop at home and a desktop at the office).\nThe processes described in the above sections (i.e. making changes to local working files, recording “snapshots” of them to create a versioned history of changes in a local Git repository, and sending those versions from our local Git repository to a remote repository (which is oftentimes on GitHub)) is illustrated using islands, buildings, bunnies, and packages in the artwork, below:\nA basic git workflow represented as two islands, one with “local repo” and “working directory”, and another with “remote repo.” Bunnies move file boxes from the working directory to the staging area, then with Commit move them to the local repo. Bunnies in rowboats move changes from the local repo to the remote repo (labeled “PUSH”) and from the remote repo to the working directory (labeled “PULL”).\n\n\n\n\nArtwork by Allison Horst\n\n\n\n\n\n6.2.4 Let’s Look at a GitHub Repository\nThis screen shows the copy of a repository stored on GitHub, with its list of files, when the files and directories were last modified, and some information on who made the most recent changes.\n\n\n\nIf we drill into the “commits” for the repository, we can see the history of changes made to all of the files. Looks like kellijohnson was working on the project and fixing errors in December:\n\n\n\nAnd finally, if we drill into one of the changes made on December 20, we can see exactly what was changed in each file:\n\n\n\nTracking these changes, how they relate to released versions of software and files is exactly what Git and GitHub are good for. And we will show how they can really be effective for tracking versions of scientific code, figures, and manuscripts to accomplish a reproducible workflow.\n\n\n6.2.5 Git Vocabulary & Commands\nWe know the world of Git and GitHub can be daunting. Use these tables as references while you use Git and GitHub, and we encourage you to build upon this list as you become more comfortable with these tools.\nThis table contains essential terms and commands that complement intro to Git skills. They will get you far on personal and individual projects.\n\nEssential Git Commands\n\n\n\n\n\n\n\nTerm\nGit Command(s)\nDefinition\n\n\n\n\nAdd/Stage\ngit add [file]\nStaging marks a modified file in its current version to go into your next commit snapshot. You can also stage all modified files at the same time using git add .\n\n\nCommit\ngit commit\nRecords changes to the repository.\n\n\nCommit Message\ngit commit -m \"my commit message\"\nRecords changes to the repository and include a descriptive message (you should always include a commit message!).\n\n\nFetch\ngit fetch\nRetrieves changes from a remote repository but does not merge them into your local working file(s).\n\n\nPull\ngit pull\nRetrieves changes from a remote repository and merges them into your local working file(s).\n\n\nPush\ngit push\nSends local commits to a remote repository.\n\n\nStatus\ngit status\nShows the current status of the repository, including (un)staged files and branch information.\n\n\n\nThis table includes more advanced Git terms and commands that are commonly used in both individual and collaborative projects.\n\nAdvanced Git Commands\n\n\n\n\n\n\n\nTerm\nGit Command(s)\nDefinition\n\n\n\n\nBranch\ngit branch\nLists existing branches or creates a new branch.\n\n\nCheckout\ngit checkout [branch]\nSwitches to a different branch or restores files from a specific commit.\n\n\nClone\ngit clone [repository]\nCreates a local copy of a remote repository.\n\n\nDiff\ngit diff\nShows differences between files, commits, or branches.\n\n\nFork\n-\nCreates a personal copy of a repository under your GitHub account for independent development.\n\n\nLog\ngit log\nDisplays the commit history of the repository.\n\n\nMerge\ngit merge [branch]\nIntegrates changes from one branch into another branch.\n\n\nMerge Conflict\n-\nOccurs when Git cannot automatically merge changes from different branches, requiring manual resolution.\n\n\nPull Request (PR)\n-\nA request to merge changes from a branch into another branch, typically in a collaborative project.\n\n\nRebase\ngit rebase\nIntegrates changes from one branch onto another by modifying commit history.\n\n\nRemote\ngit remote\nManages remote repositories linked to the local repository.\n\n\nRepository\ngit init\nA directory where Git tracks and manages files and their versions.\n\n\nStash\ngit stash\nTemporarily saves changes that are not ready to be committed.\n\n\nTag\ngit tag\nAssigns a label or tag to a specific commit.\n\n\n\nGit has a rich set of commands and features, and there are many more terms beyond either table. Learn more by visiting the git documentation." }, { "objectID": "session_06.html#exercise-1-create-a-remote-repository-on-github", "href": "session_06.html#exercise-1-create-a-remote-repository-on-github", "title": "6 Intro to Git and GitHub", "section": "6.3 Exercise 1: Create a remote repository on GitHub", - "text": "6.3 Exercise 1: Create a remote repository on GitHub\n\n\n\n\n\n\nSetup\n\n\n\n\nLog into GitHub\nClick the New repository button\nName it {FIRSTNAME}_test\nAdd a short description\nCheck the box to add a README.md file\nAdd a .gitignore file using the R template\nSet the LICENSE to Apache 2.0\n\n\n\nIf you were successful, it should look something like this:\n\n\n\nYou’ve now created your first repository! It has a couple of files that GitHub created for you, like the README.md file, and the LICENSE file, and the .gitignore file.\n\n\n\nFor simple changes to text files, you can make edits right in the GitHub web interface.\n\n\n\n\n\n\nChallenge\n\n\n\nNavigate to the README.md file in the file listing, and edit it by clicking on the pencil icon. This is a regular Markdown file, so you can just add markdown text. Add a new level 2 header called “Purpose” and add some bullet points describing the purpose of the repo. When done, add a commit message, and hit the “Commit changes” button.\n\n\n\n\n\nCongratulations, you’ve now authored your first versioned commit! If you navigate back to the GitHub page for the repository, you’ll see your commit listed there, as well as the rendered README.md file.\n\n\n\nLet’s point out a few things about this window. It represents a view of the repository that you created, showing all of the files in the repository so far. For each file, it shows when the file was last modified, and the commit message that was used to last change each file. This is why it is important to write good, descriptive commit messages. In addition, the header above the file listing shows the most recent commit, along with its commit message, and its SHA identifier. That SHA identifier is the key to this set of versioned changes. If you click on the SHA identifier (6c18e0a), it will display the set of changes made in that particular commit.\n\n\n\n\n\n\nWhat should I write in my commit message?\n\n\n\nWriting effective Git commit messages is essential for creating a meaningful and helpful version history in your repository. It is crucial to avoid skipping commit messages or resorting to generic phrases like “Updates.” When it comes to following best practices, there are several guidelines to enhance the readability and maintainability of the codebase.\nHere are some guidelines for writing effective Git commit messages:\n\nBe descriptive and concise: Provide a clear and concise summary of the changes made in the commit. Aim to convey the purpose and impact of the commit in a few words.\nUse imperative tense: Write commit messages in the imperative tense, as if giving a command. For example, use “Add feature” instead of “Added feature” or “Adding feature.” This convention aligns with other Git commands and makes the messages more actionable.\nSeparate subject and body: Start with a subject line, followed by a blank line, and then provide a more detailed explanation in the body if necessary. The subject line should be a short, one-line summary, while the body can provide additional context, motivation, or details about the changes.\nLimit the subject line length: Keep the subject line within 50 characters or less. This ensures that the commit messages are easily scannable and fit well in tools like Git logs.\nCapitalize and punctuate properly: Begin the subject line with a capital letter and use proper punctuation. This adds clarity and consistency to the commit messages.\nFocus on the “what” and “why”: Explain what changes were made and why they were made. Understanding the motivation behind a commit helps future researchers and collaborators (including you!) comprehend its purpose.\nUse present tense for subject, past tense for body: Write the subject line in present tense as it represents the current state of the codebase. Use past tense in the body to describe what has been done.\nReference relevant issues: If the commit is related to a specific issue or task, include a reference to it. For example, you can mention the issue number or use keywords like “Fixes,” “Closes,” or “Resolves” followed by the issue number." + "text": "6.3 Exercise 1: Create a remote repository on GitHub\n\n\n\n\n\n\nSetup\n\n\n\n\nLogin to GitHub\nClick the New repository button\nName it {FIRSTNAME}_test\nAdd a short description\nCheck the box to add a README.md file\nAdd a .gitignore file using the R template\nSet the LICENSE to Apache 2.0\n\n\n\nIf you were successful, it should look something like this:\n\n\n\n\n\nYou’ve now created your first repository! It has a couple of files that GitHub created for you: README.md, LICENSE, and .gitignore.\n\n\n\n\n\n\nREADME.md files are used to share important information about your repository\n\n\n\nYou should always add a README.md to the root directory of your repository – it is a markdown file that is rendered as HTML and displayed on the landing page of your repository. This is a common place to include any pertinent information about what your repository contains, how to use it, etc.\n\n\n\n\n\n\nFor simple changes to text files, such as the README.md, you can make edits directly in the GitHub web interface.\n\n\n\n\n\n\nChallenge\n\n\n\nNavigate to the README.md file in the file listing, and edit it by clicking on the pencil icon (top right of file). This is a regular Markdown file, so you can add markdown text. Add a new level-2 header called “Purpose” and add some bullet points describing the purpose of the repo. When done, add a commit message, and hit the Commit changes button.\n\n\n\n\n \n\nCongratulations, you’ve now authored your first versioned commit! If you navigate back to the GitHub page for the repository, you’ll see your commit listed there, as well as the rendered README.md file.\n\n\n \n\nThe GitHub repository landing page provides us with lots of useful information. To start, we see:\n\nall of the files in the remote repository\nwhen each file was last edited\nthe commit message that was included with each file’s most recent commit (which is why it’s important to write good, descriptive commit messages!)\n\nAdditionally, the header above the file listing shows the most recent commit, along with its commit message, and a unique ID (assigned by Git) called a SHA. The SHA (aka hash) identifies the specific changes made, when they were made, and by who. If you click on the SHA, it will display the set of changes made in that particular commit.\n\n\n\n\n\n\nWhat should I write in my commit message?\n\n\n\nWriting effective Git commit messages is essential for creating a meaningful and helpful version history in your repository. It is crucial to avoid skipping commit messages or resorting to generic phrases like “Updates.” When it comes to following best practices, there are several guidelines to enhance the readability and maintainability of the codebase.\nHere are some guidelines for writing effective Git commit messages:\n\nBe descriptive and concise: Provide a clear and concise summary of the changes made in the commit. Aim to convey the purpose and impact of the commit in a few words.\nUse imperative tense: Write commit messages in the imperative tense, as if giving a command. For example, use “Add feature” instead of “Added feature” or “Adding feature.” This convention aligns with other Git commands and makes the messages more actionable.\nSeparate subject and body: Start with a subject line, followed by a blank line, and then provide a more detailed explanation in the body if necessary. The subject line should be a short, one-line summary, while the body can provide additional context, motivation, or details about the changes.\nLimit the subject line length: Keep the subject line within 50 characters or less. This ensures that the commit messages are easily scannable and fit well in tools like Git logs.\nCapitalize and punctuate properly: Begin the subject line with a capital letter and use proper punctuation. This adds clarity and consistency to the commit messages.\nFocus on the “what” and “why”: Explain what changes were made and why they were made. Understanding the motivation behind a commit helps future researchers and collaborators (including you!) comprehend its purpose.\nUse present tense for subject, past tense for body: Write the subject line in present tense as it represents the current state of the codebase. Use past tense in the body to describe what has been done.\nReference relevant issues: If the commit is related to a specific issue or task, include a reference to it. For example, you can mention the issue number or use keywords like “Fixes,” “Closes,” or “Resolves” followed by the issue number." }, { "objectID": "session_06.html#exercise-2-clone-your-repository-and-use-git-locally-in-rstudio", "href": "session_06.html#exercise-2-clone-your-repository-and-use-git-locally-in-rstudio", "title": "6 Intro to Git and GitHub", "section": "6.4 Exercise 2: clone your repository and use Git locally in RStudio", - "text": "6.4 Exercise 2: clone your repository and use Git locally in RStudio\nIn this exercise, we’ll use the GitHub URL for the GitHub repository you created to clone the repository onto your local machine so that you can edit the files in RStudio.\nStart by copying the GitHub URL, which represents the repository location:\n\n\n\n\n\n\n\nRStudio knows how to work with files under version control with Git, but only if you are working within an R project folder.\nNext, let’s clone the repository created on GitHub so we have it accessible as an R project in RStudio.\n\n\n\n\n\n\nAn important distinction\n\n\n\nWe refer to the remote copy of the repository that is on GitHub as the origin repository (the one that we cloned from), and the copy on our local computer as the local repository.\n\n\n\n\n\n\n\n\nSetup\n\n\n\n\nIn the File menu, select “New Project”\nIn the dialog that pops up, select the “Version Control” option, and paste the GitHub URL that you copied into the field for the remote repository Repository URL\nWhile you can name the local copy of the repository anything, it’s typical to use the same name as the GitHub repository to maintain the correspondence\n\n\n\n\n\n\n\n\nOnce you hit “Create Project”, a new RStudio window will open with all of the files from the remote repository copied locally. Depending on how your version of RStudio is configured, the location and size of the panes may differ, but they should all be present, including a Git tab and the normal Files tab listing the files that had been created in the remote repository.\n\n\n\nYou’ll note that there is one new file halina_test.Rproj, and three files that we created earlier on GitHub (.gitignore, LICENSE, and README.md).\nIn the Git tab, you’ll note that two files are listed. This is the status pane that shows the current modification status of all of the files in the repository. In this case, the .gitignore file is listed as M for Modified, and halina_test.Rproj is listed with a ?? to indicate that the file is untracked. This means that Git has not stored any versions of this file, and knows nothing about the file. As you make version control decisions in RStudio, these icons will change to reflect the current version status of each of the files.\nInspect the history. For now, let’s click on the History button in the Git tab, which will show the log of changes that occurred, and will be identical to what we viewed on GitHub. By clicking on each row of the history, you can see exactly what was added and changed in each of the two commits in this repository.\n\n\n\n\n\n\n\nChallenge\n\n\n\n\nLet’s make a change to the README.md file, this time from RStudio, then commit the README.md change\nAdd a new section to your README.md called “Creator” using a level 2 header, and under it include some information about yourself. Bonus: Add some contact information and link your email using Markdown syntax\n\n\n\nOnce you save, you’ll immediately see the README.md file show up in the Git tab, marked as a modification. You can select the file in the Git tab, and click Diff to see the differences that you saved (but which are not yet committed to your local repository).\n\nAnd here’s what the newly made changes look like compared to the original file. New lines are highlighted in green, while removed lines are in red.\n\nCommit the RStudio changes. To commit the changes you made to the README.md file, check the Staged checkbox next to the file (which tells Git which changes you want included in the commit), then provide a descriptive commit message, and then click “Commit”.\n\nNote that some of the changes in the repository, namely halina_test.Rproj are still listed as having not been committed. This means there are still pending changes to the repository. You can also see the note that says:\nYour branch is ahead of ‘origin/main’ by 1 commit.\nThis means that we have committed 1 change in the local repository, but that commit has not yet been pushed up to the origin repository, where origin is the typical name for our remote repository on GitHub. So, let’s commit the remaining project files by staging them and adding a commit message.\n\nWhen finished, you’ll see that no changes remain in the Git tab, and the repository is clean.\nInspect the history. Note that the message now says:\nYour branch is ahead of ‘origin/main’ by 2 commits.\nThese 2 commits are the two we just made, and have not yet been pushed to GitHub. By clicking on the “History” button, we can see that there are now a total of four commits in the local repository (while there had only been two on GitHub).\n\nPush these changes to GitHub. Now that everything has been changed as desired locally, you can push the changes to GitHub using the Push button. This will prompt you for your GitHub username and password, and upload the changes, leaving your repository in a totally clean and synchronized state. When finished, looking at the history shows all four commits, including the two that were done on GitHub and the two that were done locally on RStudio.\n\n\n\nAnd note that the labels indicate that both the local repository (HEAD) and the remote repository (origin/HEAD) are pointing at the same version in the history. So, if we go look at the commit history on GitHub, all the commits will be shown there as well.\n\n\n\n\n\n\n\n\n\nLast thing, some Git configuration\n\n\n\nWhen Git released version 2.27, a new feature they incorporated allows users to specify how to pull (essentially), otherwise a warning will appear. To suppress this warning we need to configure our Git with this line of code:\ngit config pull.rebase false\npull.rebase false is a default strategy for pulling where it will try to auto-merge the files if possible, and if it can’t it will show a merge conflict" + "text": "6.4 Exercise 2: clone your repository and use Git locally in RStudio\nCurrently, our repository just exists on GitHub as a remote repository. It’s easy enough to make changes to things like our README.md file (as demonstrated above), from the web browser, but that becomes a lot harder (and discouraged) for scripts and other code files. In this exercise, we’ll bring a copy of this remote repository down to our local computer (aka clone this repository) so that we can work comfortably in RStudio.\n\n\n\n\n\n\nAn important distinction\n\n\n\nWe refer to the remote copy of the repository that is on GitHub as the origin repository (the one that we cloned from), and the copy on our local computer as the local repository.\n\n\nStart by clicking the green Code button (top right of your file listing) and copying the URL to your clipboard (this URL represents the repository location):\n\n\n\n\n\n\n\nRStudio makes working with Git and version controlled files easy – to do so, you’ll need to be working within an R project folder. The following steps will look similar to those you followed when first creating an R Project (see Appendix), with a slight difference. Follow the instructions in the Setup box below to clone your remote repository to your local computer in RStudio:\n\n\n\n\n\n\nSetup\n\n\n\n\nClick File > New Project\nSelect Version Control and paste the remote repository URL (which should be copied to your clipboard) in the Repository ULR field\nPress Tab, which will auto-fill the Project directory name field with the same name as that of your remote repo – while you can name the local copy of the repository anything, it’s typical (and highly recommended) to use the same name as the GitHub repository to maintain the correspondence\n\n\n\n\n\n\n\n\nOnce you click Create Project, a new RStudio window will open with all of the files from the remote repository copied locally. Depending on how your version of RStudio is configured, the location and size of the panes may differ, but they should all be present – you should see a Git tab, as well as the Files tab, where you can view all of the files copied from the remote repo to this local repo.\n\n\n\nYou’ll note that there is one new file sam_test.Rproj, and three files that we created earlier on GitHub (.gitignore, LICENSE, and README.md).\nIn the Git tab, you’ll note that the one new file, sam_test.Rproj, is listed. This Git tab is the status pane that shows the current modification status of all of the files in the repository. Here, we see sam_test.Rproj is preceded by a ?? symbol to indicate that the file is currently untracked by Git. This means that we have not yet committed this file using Git (i.e. Git knows nothing about the file; hang tight, we’ll commit this file soon so that it’s tracked by Git). As you make version control decisions in RStudio, these icons will change to reflect the current version status of each of the files.\nInspect the history. Click on the History button in the Git tab to show the log of changes that have occurred – these changes will be identical to what we viewed on GitHub. By clicking on each row of the history, you can see exactly what was added and changed in each of the two commits in this repository.\n\n\n\n\n\n\n\nChallenge\n\n\n\n\nMake a change to the README.md file – this time from RStudio – then commit the README.md change\nAdd a new section to your README.md called “Creator” using a level-2 header. Under it include some information about yourself. Bonus: Add some contact information and link your email using Markdown syntax.\n\n\n\nOnce you save, you’ll immediately see the README.md file show up in the Git tab, marked as a modification. Select the file in the Git tab, and click Diff to see the changes that you saved (but which are not yet committed to your local repository). Newly made changes are highlighted in green.\n\n\n\nCommit the changes. To commit the changes you made to the README.md file using RStudio’s GUI (Graphical User Interface), rather than the command line:\n\nStage (aka add) README.md by clicking the check box next to the file name – this tells Git which changes you want included in the commit and is analogous to using the git command, git add README.md, in the command line\nCommit README.md by clicking the Commit button and providing a descriptive commit message in the dialog box. Press the Commit button once you’re satisfied with your message. This is analogous to using the git command, git commit -m \"my commit message\", in the command line.\n\n\nA few notes about our local repository’s state:\n\nWe still have a file, sam_test.Rproj, that is listed as untracked (denoted by ?? in the Git tab).\nYou should see a message at the top of the Git tab that says, Your branch is ahead of ‘origin/main’ by 1 commit., which tells us that we have 1 commit in the local repository, but that commit has not yet been pushed up to the origin repository (aka remote repository on GitHub).\n\nCommit the remaining project file by staging/adding and committing it with an informative commit message.\n\nWhen finished, you’ll see that no changes remain in the Git tab, and the repository is clean.\nInspect the history. Note that under Changes, the message now says:\nYour branch is ahead of ‘origin/main’ by 2 commits.\nThese are the two commits that we just made, but have not yet been pushed to GitHub.\nClick on the History button to see a total of four commits in the local repository (the two we made directly to GitHub via the web browser and the two we made in RStudio).\n\nPush these changes to GitHub. Now that we’ve made and committed changes locally, we can push those changes to GitHub using the Push button. This sends your changes to the remote repository (on GitHub) leaving your repository in a totally clean and synchronized state (meaning your local repository and remote repository should look the same).\n\n\n\n\n\n\nIf you are prompted to provide your GitHub username and password when Pushing…\n\n\n\nit’s a good indicator that you did not set your GitHub Personal Access Token (PAT) correctly. You can redo the steps outlined in the GitHub Authentication section of the Appendix to (re)set your PAT, then Push again.\n\n\n\n\n\nIf you look at the History pane again, you’ll notice that the labels next to the most recent commit indicate that both the local repository (HEAD) and the remote repository (origin/HEAD) are pointing at the same version in the history. If we look at the commit history on GitHub, all the commits will be shown there as well.\n\n \n\n\n\n\n\n\nLast thing, some Git configuration to surpress warning messages\n\n\n\nGit version 2.27 includes a new feature that allows users to specify the default method for integrating changes from a remote repository into a local repository, without receiving a warning (this warning is informative, but can get annoying). To suppress this warning for this repository only we need to configure Git by running this line of code in the Terminal:\n\ngit config pull.rebase false\n\npull.rebase false is a default strategy for pulling where Git will first try to auto-merge the files. If auto-merging is not possible, it will indicate a merge conflict (more on resolving merge conflicts in Chapter 11).\nNote: Unlike when we first configured Git (see Appendix), we do not include the --global flag here (e.g. git config --global pull.rebase false). This sets this default strategy for this repository only (rather than globally for all your repositories). We do this because your chosen/default method of grabbing changes from a remote repository (e.g. pulling vs. rebasing) may change depending on collaborator/workflow preference." }, { "objectID": "session_06.html#exercise-3-setting-up-git-on-an-existing-project", "href": "session_06.html#exercise-3-setting-up-git-on-an-existing-project", "title": "6 Intro to Git and GitHub", "section": "6.5 Exercise 3: Setting up Git on an existing project", - "text": "6.5 Exercise 3: Setting up Git on an existing project\nNow you have two projects set up in your RStudio environment, training_{USERNAME} and {FIRSTNAME}_test. We set you up with the {FIRSTNAME}_test project since we think it is an easy way to introduce you to Git, but more commonly researchers will have an existing directory of code that they then want to make a Git repository out of. For the last exercise of this session, we will do this with your training_{USERNAME} project.\nFirst, switch to your training_{USERNAME} project using the RStudio project dropdown menu. The project dropdown menu is in the upper right corner of your RStudio pane. Click the dropdown next to your project name ({FIRSTNAME}_test), and then select the training_{USERNAME} project from the “recent projects” list.\n\n\n\n\n\nNext, from the Tools menu, select “Project Options.” In the dialog that pops up, select “Git/SVN” from the menu on the left. In the dropdown at the top of this page, select Git and click “Yes” in the confirmation box. Click “Yes” again to restart RStudio.\nWhen RStudio restarts, you should have a Git tab, with two untracked files (.gitignore and training_{USERNAME}.Rproj).\n\n\n\n\n\n\nChallenge\n\n\n\nAdd and commit the .gitignore and training_{USERNAME}.Rproj files to your Git repository.\n\n\nNow we have your local repository all set up. You can make as many commits as you want on this repository, and it will likely still be helpful to you, but the power in Git and GitHub is really in collaboration. As discussed, GitHub facilitates this, so let’s get this repository on GitHub.\n\n\n\n\n\n\nSetup\n\n\n\n\nGo to GitHub, and click on the “New Repository” button.\nIn the repository name field, enter the same name as your R Project. So for me, this would be training_dolinh.\nAdd a description, keep the repository public, and, most importantly: DO NOT INITIALIZE THE REPOSITORY WITH ANY FILES. We already have the repository set up locally so we don’t need to do this. Initializing the repository will only cause merge issues.\n\nHere is what your page should look like:\n\n\n\n\n\n\nClick the “Create repository” button.\n\n\n\nThis will open your empty repository with a page that conveniently gives you exactly the instructions you need. In our case, we are going to “push an existing repository from the command line.”\n\n\n\nClick the clipboard icon to copy the code for the middle option of the three on this page. It should have three lines and look like this:\ngit remote add origin https://github.com/hdolinh/training_dolinh.git\ngit branch -M main\ngit push -u origin main\nBack in RStudio, open the terminal by clicking the Terminal tab next to the Console tab. The prompt should look something like this:\ndolinh@included-crab:~/training_dolinh$\nIn the prompt, paste the code that you copied from the GitHub page and press return.\nThe code that you copied and pasted did three things:\n\nAdded the GitHub repository as the remote repository\nRenamed the default branch to main\nPushed the main branch to the remote GitHub repository\n\nIf you go back to your browser and refresh your GitHub repository page, you should now see your files appear.\n\n\n\n\n\n\nChallenge\n\n\n\nOn your repository page, GitHub has a button that will help you add a README.md file. Click the “Add a README” button and use markdown syntax to create a README.md Commit the changes to your repository.\nGo to your local repository (in RStudio) and pull the changes you made." + "text": "6.5 Exercise 3: Setting up Git on an existing project\nThere are a number of different workflows for creating version-controlled repositories that are stored on GitHub. We started with Exercise 1 and Exercise 2 using one common approach: creating a remote repository on GitHub first, then cloning that repository to your local computer (you used your {FIRSTNAME}_test repo).\nHowever, you may find yourself in the situation where you have an existing directory (i.e. a “normal” folder) of code that you want to make a Git repository out of, and then send it to GitHub. In this last exercise, we will practice this workflow using your training_{USERNAME} project.\nFirst, switch to your training_{USERNAME} project using the RStudio project dropdown menu. The project drop down menu is in the upper right corner of your RStudio pane. Click the drop down next to your project name ({FIRSTNAME}_test), and then select the training_{USERNAME} project from the RECENT PROJECTS list.\n\n\n\n\n\nThere are a few approaches for turning an existing project folder into a Git repository, then sending it to GitHub – if you’re an R-user, the simplest way is to use the {usethis} package, which is built to automate tasks involved with project setup and development. However, you can also initialize a local git repository and set the remote repository from the command line (a language-agnostic workflow). Steps for both approaches are included below (demonstrated using your training_{USERNAME} project):\n\nUsing R & {usethis}Using the command line\n\n\n\nInstall the {usethis} package (if you haven’t done so already) by running the following in your Console:\n\n\ninstall.packages(\"usethis\")\n\n\nInitialize training_{USERNAME} as a Git repository by running usethis::use_git() in the Console. Choose yes when asked if it’s okay to commit any uncommitted files. Choose yes again if asked to restart R. Once complete, you should see the Git tab appear in your top left pane in RStudio and a .gitignore file appear in your Files tab.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.gitignore files allow you to specify which files/folders you don’t want Git to track\n\n\n\nA .gitignore file is automatically created in the root directory of your project when you initialize it as a Git repository. You’ll notice that there are already some R / R Project-specific files that have been added by default.\nWhy is this useful? For many reasons, but possibly the greatest use-case is adding large files (GitHub has a file size limit of 2 GB) or files with sensitive information (e.g. keys, tokens) that you don’t want to accidentally push to GitHub.\nHow do I do this? Let’s say I create a file with sensitive information that I don’t want to push to GitHub. I can add a line to my .gitignore file:\n\n# added by default when I initalized my RProj as a Git Repository\n.Rproj.user\n.Rhistory\n.Rdata\n.httr-oauth\n.DS_Store\n.quarto\n\n# add file so that it doesn't get pushed to the remote repo (on GitHub); \ncontains_sensitive_info.R\n\nIf this file is currently untracked by Git, it should appear in my Git tab. Once I add it to the .gitignore and save the modified .gitignore file, you should see contains_sensitive_info.R disappear from the Git tab, and a modified .gitignore (denoted by a blue M) appear. Stage/commit/push this modified .gitignore file.\n\n\n\nCreate an upstream remote repository on GitHub by running usethis::use_github() in the Console. Your web browser should open up to your new GitHub repository, with the same name as your local Git repo/R Project.\n\n\n\n\n\n\n\n\n\n\n\nEnsure that your default branch is named main rather than master by:\n\nrunning git branch in the Terminal to list all your branches (you should currently only have one, which is your default)\nif it’s named master, run the following line in the Console to update it\n\n\n\nusethis::git_default_branch_rename(from = \"master\", to = \"main\")\n\nYou can verify that your update worked by running git branch once more in the Terminal.\n\n\n\n\n\n\nWhy are we doing this?\n\n\n\nThe racist “master” terminology for git branches motivates us to update our default branch to “main” instead.\nThere is a push across platforms and software to update this historical default branch name from master to main. GitHub has already done so – you may have noticed that creating a remote repository first (like we did in Exercises 1 & 2) results in a default branch named main. Depending on your version of Git, however, you may need to set update the name manually when creating a local git repository first (as we’re doing here).\n\n\n\nYou’re now ready to edit, stage/add, commit, and push files to GitHub as practiced earlier!\n\n\n\n\n\n\n\nChallenge: add a README.md file to training_{USERNAME}\n\n\n\nGitHub provides a button on your repo’s landing page for quickly adding a README.md file. Click the Add a README button and use markdown syntax to create a README.md. Commit the changes to your repository.\nGo to your local repository (in RStudio) and pull the changes you made.\n\n\n\n\nWhile we’ll be using the RStudio Terminal here, you can use any command-line interface (e.g. Mac Terminal, Git Bash, etc.) that allows for git interactions (if you plan to use a command-line interface that is not the RStudio Terminal, make sure to navigate to your project directory (e.g. using cd file/path/to/project/directory) before initializing your repository.\n\nInitialize training_{USERNAME} as a Git repository by running git init in the Terminal. You should get a message that says something like:\n\n\nInitialized empty Git repository in /home/username/training_username/.git/\n\n\n\n\n\n\n\nYou may have to quit and reopen your RStudio session on the server for the Git tab to appear\n\n\n\nYou’ll likely need to help included-crab along in recognizing that this R Project has been initialized as a git repository – click Session > Quit Session… > New Session > choose training_{USERNAME} to reopen your project.\n\n\nOnce complete, you should see the Git tab appear in your top left pane in RStudio and a .gitignore file appear in your Files tab.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.gitignore files allow you to specify which files/folders you don’t want Git to track\n\n\n\nA .gitignore file is automatically created in the root directory of your project when you initialize it as a Git repository. You’ll notice that there are already some R / R Project-specific files that have been added by default.\nWhy is this useful? For many reasons, but possibly the greatest use-case is adding large files (GitHub has a file size limit of 2 GB) or files with sensitive information (e.g. keys, tokens) that you don’t want to accidentally push to GitHub.\nHow do I do this? Let’s say I create a file with sensitive information that I don’t want to push to GitHub. I can add a line to my .gitignore file:\n\n# added by default when I initalized my RProj as a Git Repository\n.Rproj.user\n.Rhistory\n.Rdata\n.httr-oauth\n.DS_Store\n.quarto\n\n# add file so that it doesn't get pushed to the remote repo (on GitHub); \ncontains_sensitive_info.R\n\nIf this file is currently untracked by Git, it should appear in my Git tab. Once I add it to the .gitignore and save the modified .gitignore file, you should see contains_sensitive_info.R disappear from the Git tab, and a modified .gitignore (denoted by a blue M) appear. Stage/commit/push this modified .gitignore file.\n\n\n\nEnsure that your default branch is named main rather than master by:\n\nrunning git branch in the Terminal to list all your branches (you should currently only have one, which is your default)\nif it’s named master, run the following line in the Terminal to update it\n\n\n\n# for Git version 2.28+ (check by running `git --version`)\n# this sets the default branch name to `main` for any new repos moving forward\ngit config --global init.defaultBranch main\n\n# for older versions of Git\n# this sets the default branch name to `main` ONLY for this repo \ngit branch -m master main\n\nYou can verify that your update worked by running git branch once more in the Terminal.\n\n\n\n\n\n\nWhy are we doing this?\n\n\n\nThe racist “master” terminology for git branches motivates us to update our default branch to “main” instead.\nThere is a push across platforms and software to update this historical default branch name from master to main. GitHub has already done so – you may have noticed that creating a remote repository first (like we did in Exercises 1 & 2) results in a default branch named main. Depending on your version of Git, however, you may need to set update the name manually when creating a local git repository first (as we’re doing here).\n\n\n\nStage/Add your files. It’s helpful to first run git status to check the state of your local repository (particularly if you aren’t using RStudio / have access to a GUI with a Git tab-esque feature) – this will tell you which files have been modified or are untracked and that are currently unstaged (in red). What appears here should look just like what appears in the Git tab:\n\n\n\n\n\n\n\n\n\n\nRun git add . in the Terminal to stage all files at once (or git add {FILENAME} to stage individual files). Running git status again will show you which files have been staged (in green). You may have to refresh your Git tab to see the change in state reflected in the GUI.\n\n\n\n\n\n\n\n\n\n\nCommit your files by running git commit -m \"an informative commit message\" in the Terminal. Refreshing your Git tab will cause them to disappear (just as they do when you commit using RStudio’s GUI buttons). You can run git log in the Terminal to see a history of your past commits (currently, we only have this one).\n\n\n\n\n\n\n\n\n\n\n\nCreate an empty remote repository by logging into GitHub and creating a new repository, following the same steps as in Exercise 1. IMPORTANTLY, DO NOT initialize your remote repo with a README license, or .gitignore file – doing so now can lead to merge conflicts. We can add them after our local and remote repos are linked. Name your remote repository the same as your local repository (i.e. training_{USERNAME}).\nLink your remote (GitHub) repository to your local Git repository. Your empty GitHub repo conveniently includes instructions for doing so. Copy the code under push an existing repository from the command line to your clipboard, paste into your RStudio Terminal, and press return/enter.\n\n\n\n\n\n\n\n\n\n\nThese commands do three things:\n\nAdds the GitHub repository as the remote repository (i.e. links your local repo to the remote repo)\nRenames the default branch to main\nPushes the main branch to the remote GitHub repository\n\nHead back to your browser and refresh your GitHub repository page to see your files appear!\n\nYou’re now ready to edit, stage/add, commit, and push files to GitHub as practiced earlier!\n\n\n\n\n\n\n\nChallenge: add a README.md file to training_{USERNAME}\n\n\n\nGitHub provides a button on your repo’s landing page for quickly adding a README.md file. Click the Add a README button and use markdown syntax to create a README.md. Commit the changes to your repository.\nGo to your local repository (in RStudio) and pull the changes you made." }, { "objectID": "session_06.html#go-further-with-git", diff --git a/public/2023-10-coreR/session_01.html b/public/2023-10-coreR/session_01.html index ecf6401b..ad6bbacd 100644 --- a/public/2023-10-coreR/session_01.html +++ b/public/2023-10-coreR/session_01.html @@ -194,7 +194,7 @@Git
+ GitHub
+ Git
and GitHub?Git
Life cycleGit
Vocabulary & Commandsclone
your repository and use Git
locally in RStudioGit
on an existing projectGit
Git
resourcesclone
your repository and use Git locally in RStudioGit
to track and manage changes of a projectGit
workflow including pulling changes, staging modified files, committing changes, pulling again to incorporate remote changes, and pushing changes to a remote repositoryGit
repositories using different workflowsEvery file in the scientific process changes. Manuscripts are edited. Figures get revised. Code gets fixed when bugs are discovered. Sometimes those fixes lead to even more bugs, leading to more changes in the codebase. Data files get combined together. Sometimes those same files are split and combined again. All that to say - in just one research project, we can expect thousands of changes to occur.
-These changes are important to track, and yet, we often use simplistic filenames to track them. Many of us have experienced renaming a document or script multiple times with the ingenuine addition of “final” to the filename (like the comic above demonstrates).
+Every file in the scientific process changes. Manuscripts are edited. Figures get revised. Code gets fixed when bugs are discovered. Sometimes those fixes lead to even more bugs, leading to more changes in the code base. Data files get combined together. Sometimes those same files are split and combined again. In just one research project, we can expect thousands of changes to occur.
+These changes are important to track, and yet, we often use simplistic file names to do so. Many of us have experienced renaming a document or script multiple times with the ingenuine addition of “final” to the file name (like the comic above demonstrates).
You might think there is a better way, and you’d be right: version control. Version control provides an organized and transparent way to track changes in code and additional files. This practice was designed for software development, but is easily applicable to scientific programming.
There are many benefits to using a version control software including:
The version control system we’ll be diving into is Git
, the most widely used modern version control system in the world.
The version control system we’ll be diving into is Git, the most widely used modern version control system in the world.
Git
+ GitHubBefore diving into the details of Git
and how to use it, let’s start with a motivating example that’s representative of the types of problems Git
can help us solve.
Before diving into the details of Git and how to use it, let’s start with a motivating example that’s representative of the types of problems Git can help us solve.
Say, for example, you’re working on an analysis in R and you’ve got it into a state you’re pretty happy with. We’ll call this version 1:
@@ -348,25 +348,25 @@Commenting out code you don’t want to lose is something probably all of us have done at one point or another but it’s really hard to understand why you did this when you come back years later or you when you send your script to a colleague. Luckily, there’s a better way: Version control. Instead of commenting out the old code, we can change the code in place and tell Git
to commit our change. So now we have two distinct versions of our analysis and we can always see what the previous version(s) look like.
Commenting out code you don’t want to lose is something probably all of us have done at one point or another but it’s really hard to understand why you did this when you come back years later or you when you send your script to a colleague. Luckily, there’s a better way: Version control. Instead of commenting out the old code, we can change the code in place and tell Git to commit our change. So now we have two distinct versions of our analysis and we can always see what the previous version(s) look like.
You may have noticed something else in the diagram above: Not only can we save a new version of our analysis, we can also write as much text as we like about the change in the commit message. In addition to the commit message, Git
also tracks who, when, and where the change was made.
You may have noticed something else in the diagram above: Not only can we save a new version of our analysis, we can also write as much text as we like about the change in the commit message. In addition to the commit message, Git also tracks who, when, and where the change was made.
Imagine that some time has gone by and you’ve committed a third version of your analysis, version 3, and a colleague emails with an idea: What if you used machine learning instead?
Maybe you’re not so sure the idea will work out and this is where a tool like Git
shines. Without a tool like Git
, we might copy analysis.R to another file called analysis-ml.R which might end up having mostly the same code except for a few lines. This isn’t particularly problematic until you want to make a change to a bit of shared code and now you have to make changes in two files, if you even remember to.
Instead, with Git
, we can start a branch. Branches allow us to confidently experiment on our code, all while leaving the old code in tact and recoverable.
Maybe you’re not so sure the idea will work out and this is where a tool like Git shines. Without a tool like Git, we might copy analysis.R to another file called analysis-ml.R which might end up having mostly the same code except for a few lines. This isn’t particularly problematic until you want to make a change to a bit of shared code and now you have to make changes in two files, if you even remember to.
+Instead, with Git, we can start a branch. Branches allow us to confidently experiment on our code, all while leaving the old code in tact and recoverable.
So you’ve been working in a branch and have made a few commits on it and your boss emails again asking you to update the model in some way. If you weren’t using a tool like Git
, you might panic at this point because you’ve rewritten much of your analysis to use a different method but your boss wants change to the old method.
So you’ve been working in a branch and have made a few commits on it and your boss emails again asking you to update the model in some way. If you weren’t using a tool like Git, you might panic at this point because you’ve rewritten much of your analysis to use a different method but your boss wants change to the old method.
But with Git
and branches, we can continue developing our main analysis at the same time as we are working on any experimental branches. Branches are great for experiments but also great for organizing your work generally.
But with Git and branches, we can continue developing our main analysis at the same time as we are working on any experimental branches. Branches are great for experiments but also great for organizing your work generally.
A key takeaway here is that Git
can drastically increase your confidence and willingness to make changes to your code and help you avoid problems down the road. Analysis rarely follows a linear path and we need a tool that respects this.
A key takeaway here is that Git can drastically increase your confidence and willingness to make changes to your code and help you avoid problems down the road. Analysis rarely follows a linear path and we need a tool that respects this.
Finally, imagine that, years later, your colleague asks you to make sure the model you reported in a paper you published together was actually the one you used. Another really powerful feature of Git
is tags which allow us to record a particular state of our analysis with a meaningful name. In this case, we are lucky because we tagged the version of our code we used to run the analysis. Even if we continued to develop beyond commit 5 (above) after we submitted our manuscript, we can always go back and run the analysis as it was in the past.
Finally, imagine that, years later, your colleague asks you to make sure the model you reported in a paper you published together was actually the one you used. Another really powerful feature of Git is tags which allow us to record a particular state of our analysis with a meaningful name. In this case, we are lucky because we tagged the version of our code we used to run the analysis. Even if we continued to develop beyond commit 5 (above) after we submitted our manuscript, we can always go back and run the analysis as it was in the past.
Git
we can enhance our workflow:Git
offers a powerful distributed feature. Multiple individuals can work on the same analysis concurrently on their own computers, with the ability to merge everyone’s changes together.Git
and GitHub?Git
:Git
:Git
Git
repositories in the cloudGit
, such as a web-based interface, issue tracking, project management tools, pull requests, code review, and collaboration featuresGit
Life cycleAs a Git
user, you’ll need to understand the basic concepts associated with versioned sets of changes, and how they are stored and moved across repositories. Any given Git
repository can be cloned so that it exists both locally, and remotely. But each of these cloned repositories is simply a copy of all of the files and change history for those files, stored in Git
’s particular format. For our purposes, we can consider a Git
repository as a folder with a bunch of additional version-related metadata.
In a local Git
-enabled folder, the folder contains a workspace containing the current version of all files in the repository. These working files are linked to a hidden folder containing the ‘Local repository’, which contains all of the other changes made to the files, along with the version metadata.
So, when working with files using Git
, you can use Git
commands to indicate specifically which changes to the local working files should be staged for versioning (using the git add
command), and when to record those changes as a version in the local repository (using the command git commit
).
The remaining concepts are involved in synchronizing the changes in your local repository with changes in a remote repository. The git push
command is used to send local changes up to a remote repository (possibly on GitHub), and the git pull
command is used to fetch changes from a remote repository and merge them into the local repository.
It can be a bit daunting to understand all the moving parts of the Git / GitHub life cycle (i.e. how file changes are tracked locally within repositories, then stored for safe-keeping and collaboration on remote repositories, then brought back down to a local machine(s) for continued development). It gets easier with practice, but we’ll explain (first in words, then with an illustration) at a high-level how things work:
+Whether you’re a Mac or a PC user, you’ll likely have created a folder at some point in time for organizing files. Let’s pretend that we create a folder, called myFolder/
, and add two files: myData.csv
and myAnalysis.R
. The contents of this folder are not currently version controlled – meaning, for example, that if we make some changes to myAnalysis.R
that don’t quite work out, we have no way of accessing or reverting back to a previous version of myAnalysis.R
(without remembering/rewriting things, of course).
Git allows you to turn any “normal” folder, like myFolder/
, into a Git repository – you’ll often see/hear this referenced as “initializing a Git repository”. When you initialize a folder on your local computer as a Git repository, a hidden .git/
folder is created within that folder (e.g. myFolder/.git/
) – this .git/
folder is the Git repository. As you use Git commands to capture versions or “snapshots” of your work, those versions (and their associated metadata) get stored within the .git/
folder. This allows you to access and/or recover any previous versions of your work. If you delete .git/
, you delete your project’s history.
Here is our example folder / Git repository represented visually:
+Git was built as a command-line tool, meaning we can use Git commands in the command line (e.g. Terminal, Git Bash, etc.) to take “snapshots” of our local working files through time. Alternatively, RStudio provides buttons that help to easily execute these Git commands.
+Generally, that workflow looks something like this:
+myAnalysis.R
) in your working directory.git add myAnalysis.R
(or git add .
to stage multiple changed files at once). This lets Git know that you’d like to include the file(s) in your next commit.git commit -m "a message describing my changes"
. This records those changes (along with a descriptive message) as a “snapshot” or version in the local repository (i.e. the .git/
folder).The last step is synchronizing the changes made to our local repository with a remote repository (oftentimes, this remote repository is stored on GitHub). The git push
command is used to send local commits up to a remote repository. The git pull
command is used to fetch changes from a remote repository and merge them into the local repository – pulling will become a regular part of your workflow when collaborating with others, or even when working alone but on different machines (e.g. a laptop at home and a desktop at the office).
The processes described in the above sections (i.e. making changes to local working files, recording “snapshots” of them to create a versioned history of changes in a local Git repository, and sending those versions from our local Git repository to a remote repository (which is oftentimes on GitHub)) is illustrated using islands, buildings, bunnies, and packages in the artwork, below:
This screen shows the copy of a repository stored on GitHub, with its list of files, when the files and directories were last modified, and some information on who made the most recent changes.
@@ -455,14 +484,14 @@Tracking these changes, how they relate to released versions of software and files is exactly what Git
and GitHub are good for. And we will show how they can really be effective for tracking versions of scientific code, figures, and manuscripts to accomplish a reproducible workflow.
Tracking these changes, how they relate to released versions of software and files is exactly what Git and GitHub are good for. And we will show how they can really be effective for tracking versions of scientific code, figures, and manuscripts to accomplish a reproducible workflow.
Git
Vocabulary & CommandsWe know the world of Git
and GitHub can be daunting. Use these tables as references while you use Git
and GitHub, and we encourage you to build upon this list as you become more comfortable with these tools.
This table contains essential terms and commands that complement intro to Git
skills. They will get you far on personal and individual projects.
We know the world of Git and GitHub can be daunting. Use these tables as references while you use Git and GitHub, and we encourage you to build upon this list as you become more comfortable with these tools.
+This table contains essential terms and commands that complement intro to Git skills. They will get you far on personal and individual projects.
Term | -Git Command(s) |
+Git Command(s) | Definition | |
---|---|---|---|---|
Add | +Add/Stage | git add [file] |
-Stages or adds file changes to the next commit. git add . will stage or add all files. |
+Staging marks a modified file in its current version to go into your next commit snapshot. You can also stage all modified files at the same time using git add . |
Commit | git commit |
-Records changes to the repository with a descriptive message. | +Records changes to the repository. | |
Commit Message | -git commit -m |
-A descriptive message explaining the changes made in a commit. The message must be within quotes (e.g. “This is my commit message.”). | +git commit -m "my commit message" |
+Records changes to the repository and include a descriptive message (you should always include a commit message!). |
Fetch | git fetch |
-Retrieves changes from a remote repository but does not merge them. | +Retrieves changes from a remote repository but does not merge them into your local working file(s). | |
Pull | git pull |
-Retrieves and merges changes from a remote repository to the current branch. | +Retrieves changes from a remote repository and merges them into your local working file(s). | |
Push | @@ -507,20 +536,15 @@Stage | -- | -The process of preparing and selecting changes to be included in the next commit. | -|
Status | git status |
-Shows the current status of the repository, including changes and branch information. | +Shows the current status of the repository, including (un)staged files and branch information. |
This table includes more advanced Git
terms and commands that are commonly used in both individual and collaborative projects.
This table includes more advanced Git terms and commands that are commonly used in both individual and collaborative projects.
Term | -Git Command(s) |
+Git Command(s) | Definition | @@ -606,7 +630,7 @@
---|