Tyler Boyd-Meredith
Monday 02/06/2023
- What is Git? What is GitHub?
- The basics
- Branching workflow conventions
- Using GitHub
- Pull Requests
- Create Issues
- Project Boards
- Protect branches, set up status checks
- MY FAVORITE WORKFLOW - Skip straight here for the good stuff
- Resources
- Survey Results
The Git graphic comes from the Git Pro Book which is extremely helpful for the basics and a high level of detail. Plus, you can clone the repo that generates the book and see their Git usage in action!
-
A remote host for the repository (i.e., the Server Computer in the diagram above)
- The authoritative version of a project shared with collaborators
- backup the work that would be lost if your local machine crashed
-
A website with a bunch of nice interfaces for managing collaborative projects
- Issues, Pull Requests, Project Boards
It can be really helpful to create an example project where the stakes are low to practice learning git.
So, let's open a terminal, create a new project folder and initialize a git repository
$ cd <path-to-parent-directory>
$ mkdir example_project
$ cd example_project
$ git init
Now, if we run
$ ls -1a
we see a folder called .git/
, which is where all the version control magic happens.
Now, we have an empty repository. Let's create a README for it and add that document to the repository.
$ echo "I'm just trying to learn to use git" > README.md
$ git status
You should see that you have an untracked file called README.md
. Let's tell Git about this file, by running
git add README.md
The file is now being tracked AND it is staged to be part of the next snapshot (aka commit), which we'll make now
git commit -m "Add README"
Let's look at the resulting record
git log --stat
This will show us that we've created a new file with one line in it.
This is great, but we want to put our repository someplace where we can (1) access it if our computer breaks and (2) share it with users of the software or collaborators on the project.
First, we'll create a repository with the same name at github.com/new.
- Add a link to the remote version of the repository called origin
- Push the local version to the origin
$ git remote add origin https://github.com/<username>/example_project.git
$ git push -u origin master
- Using
-u
sets the origin to be upstream. You can omit it next time you push
- Let's have a look at the Protocols directory
cd <Protocols-parent-directory>
git clone https://github.com/Brody-Lab/Protocols.git
Now, we have folder called Protocols with the master
branch + access to other branches and the full history.
Here are some commands we can run to understand the project:
cd Protocols
git log --oneline -20
git log f53bfef7 -1 --name-status --relative-date
git log --author=boydmeredith --oneline
git log --pretty=short --shortstat -S StateMatrixSection @PBupsWT/
git log --all --oneline --graph
git log 05cce4d7 -p
-
We're ready to start writing code! But first we should ask...
What branching conventions do we want to adopt? (for our own projects)
What branching conventions are already in use on this project (for collaborative projects)
-
Some options:
- Single branch
- git-flow
- GitHub Flow
Everything happens on a single branch (usually called main
/master
/mainline
)
Creating, staging, and commiting functions is straightforward. Whenever you have made some changes to code that you want to snap shot, you do the following:
$ git add my_first_function.py
$ git commit -m "Add first example function"
Pros:
- no complicated branching rules to enforce
Cons:
- commit history can get hard to read
- can be hard to recover a working version
Use this when:
- working alone on a very small scale project
Don't use this when:
- collaborating or writing code with more than a couple moving parts
The Rules
Check out the 2010 Blogpost describing git-flow |
Pros:
- Carefully separates working code from in progress
- You can maintain separate, stable releases
Cons:
- It's complicated to follow the rules
- There's a long path from develop back to production/deployment
develop
andmain
are redundant
Use this when:
- You want to maintain stable releases
- You want to drive yourself insane
Don't use it when:
- You want to deploy1 your code on a regular basis (as we do in the rat room).
The Rules (simpler than git-flow)
-
never develop on
main
/master
if you want to add or change anything, checkout a descriptively named branch from
main
(as in git-flow,main
is always ready for use, but unlike git-flow, there is no distinction in the flow a feature branch and a hotfix branch) -
When you think the branch is ready for use, create a Pull Request to merge it into the remote
main
branch -
Use the Pull Request to finalize the branch for merging into
main
identify and fix any remaining issues
get comments from other users (you can even tag them to look at specific lines!)
-
Merge the Pull Request
-
Delete the feature branch
Pros | Cons |
|
|
Use this when: | Don't use this when: |
|
|
$ touch practice_function.py
$ git checkout -b practice-branch
$ git add practice_function.py
$ git commit -m "add function to practice branching"
$ git push origin practice-branch
If we do this at the command line, we will be prompted to open a Pull Request and given a link we can click to get straight there!
We can also create one by going to github.com/<username>/example_project/pulls
, clicking the green "New pull request" button and choosing the practice-branch
as the compare branch.
We could skip all that and go straight to github.com/<username>/example_project/compare/master...practice-branch
Here, we can ask for help from collaborators by tagging them with @<username>
.
Go try it! If there are no conflicts with the main branch and no required status checks are violated, you can now merge the pull request into the main branch! Or, maybe you'll notice there's still something that needs to be dealt with before the code gets merged...
When you merge the pull request, you'll be prompted for comments, if this PR addresses an Issue, you can write that this PR "Closes #<issue-number>" and that Issue will automatically be closed. Pretty cool!
After making the Pull Request, you may notice an issue that should be fixed before merging into the main
branch. No problem! Just make the fixes, commit them and push the branch again.
$ git add practice_function.py
$ git commit -m "fixed bug identified in PR"
$ git push origin practice-branch
Now, if you merge the pull request, it will contain the most recent code!
If the main branch has changed since you checked out your branch, there may be conflicts that need to be resolved. If so, pull down the main branch using
$ git pull origin master
You'll be told that you need to resolve conflicts before you can merge. After you've resolved the conflicts in your preferred editor, you can make a commit that merges the master branch into your feature branch. Now, if you push that branch (as we did above), you'll see that you can now merge your pull request.
Once you've done that, delete your feature branch on GitHub (it will prompt you to do this) and then checkout the master branch, pull the new work and delete the feature branch locally.
$ git checkout master
$ git pull origin master
$ git branch -d practice-branch
In scientific projects, we often have a laundry list of analyses that we'd like to run, edge cases we'd like to check, possible bugs we want to investigate. This can be hard to keep track of, especially in a way that is integrated with the code. Issues provides a fantastic solution to this problem by giving you a change to link a description of the problem (including markdown, latex, and images) to your code. This can help you document your plans and concerns as you work and gives you an easy to find record of which problems are open (have yet to be addressed) and which have been closed. You can link these to specific updates to the code by adding "Closes #<issue-numer>" to your Pull Request merge comments.
github.com/<username>/example_project/issues
(aka GitHub Flow+)
I started using a workflow at the end of my PhD and found it incredibly helpful not just for maintaining my code, but for identifying and addressing scientific questions in my project. The contours of this workflow were suggested to me by Brody Lab alum Alex Piet, so eternal thanks to him for that.
Steps:
-
Create a project board by going to github.com/<username>?tab=projects and choose the board layout. The automatic configuration will be 3 lists, "To Do", "In Progress", "Done."
-
Create a To Do List in your project board
To create a To Do Item:
-
Click the "+ Add item" button at the bottom of the "To do" column and write in a descriptive name of a To Do item.
-
Click the "..." in the upper right of your new issue and choose a repository to link to. This will automatically create an Issue assigned to the chosen repository.
-
Click the feature name to add a description by pressing "edit" in the upper right corner of the description box. You can type markdown, latex, add pictures and you can even make a checklist by making a list with "- [ ]". These will be recognized as metadata and as you check them off, will tell you how many of the total tasks you've completed".
-
-
Pick something to work on and drag it to "In Progress"
- Try to keep only 1 or 2 items in this column. If items stay there for many days, split them into 2 or more items and use a PR to move the completed work into the "Done" column.
-
Follow GitHub Flow to do the work associated with this To Do item (i.e., checkout a branch, edit things, make commits, push branch to remote, open PR, merge PR with "Closes #<issue-numer>")
- As you work, update the comments on this To Do item to document decisions you've made or new things you've learned.
- If you work on the problem for more than a few days, break the item into one that you've finished and one containing the work left to be done.
-
Watch the "Done" column fill up with a nicely documented list of issues you've addressed!
Here's an example of what the one I'm using for the wait time model project looks like. You can explore it here.
To be extra careful that the main branch never breaks, we can setup protections, here are some examples:
- Requiring Pull Requests to merge branches into
main
- Locking
main
so users can't push to it - Requiring Status Checks to complete PRs
- Setting up status checks
- Using GitHub Actions, you can set up workflows that run every time you open a pull request. For example, you could write a bunch of unit tests, make them automatically run whenever you open a PR and then you can see that none of your new code breaks any of the things you're testing for. You can then wait to merge PR until all the tests pass, or you can even make it impossible to merge the PR until the tests pass.
If you're using git at the command line, it's nice if the prompt tells you which branch you're on. Here are the command prompts that I use:
- If using bash, modify
~/.bash_profile
parse_git_branch() { git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/ (\1)/' } export PS1="\u@\h \[\033[31m\]\W\[\033[92m\]\$(parse_git_branch)\[\033[00m\] $ "
- If using zsh, modify
~/.zshrc
↑ this one is kind of ugly, I should replace with a better one# Load version control information autoload -Uz vcs_info precmd() { vcs_info } # Format the vcs_info_msg_0_ variable zstyle ':vcs_info:git:*' formats 'on %b' # Set up the prompt (with git branch name) setopt PROMPT_SUBST PROMPT='%n in ${PWD/#$HOME/~} ${vcs_info_msg_0_} %# '
Git Pro Book is a great in depth explanation of how to use GitHub
Merely useful tech book has 2 nice, short sections on Git and GitHub
git-flow was presented in this blog post
GitHub Flow is documented here
Good Research Code Handbook not super git focused, but very helpful
Footnotes
-
It's note totally clear what the analogy of "deployment" or "production" is for a scientist working on their own project, but I would argue it's whenever we present findings and possibly whenever we run an analysis and interpret the results) ↩