Skip to content

Basic Git workflow

Kristoffer L. Nielbo edited this page Feb 20, 2020 · 12 revisions

Basic branching

When working in a decentralized workflow, the concepts can be simple. master represents the official history and should always deployable. With each new scope of work, aka feature, a developer creates a new branch. For clarity, make sure to use descriptive names like train-test-split or nesterov-acceleration for your branches.

Although you may have a feature like Momentum-Optimizer, it may not be appropriate to create a feature branch at this level, there is too much work to be done. It may better to break these large deliverables down to smaller bits of work that can be continuously integrated into the project. Remember, commit early and often.

Before you create a branch, always be sure you have all the upstream changes from the origin/master branch.

Make sure you are on master

Before you pull, make sure you are on the right branch. Use the following command to list the local branches and designate which branch you are currently on.

$ git branch

The checked out branch will have a * before the name. If the return designates anything other then master then switch to master

$ git checkout master

Once on master and ready to pull updates, use the following:

$ git pull origin master

Depending on your setup, you may even be able to run only the following:

$ git pull

The git pull command combines two other commands, git fetch and git merge. When doing a fetch the resulting commits are stored as a remote branch allowing you to review the changes before merging. Merging, on the other hand, can involve additional steps and flags in the command, but more on that later. For now, stick with git pull.

Now that you are all up to date with the remote repo, create a branch. For efficiency, use the following:

$ git checkout -b some-feature-branch

This command will create a new branch from master as well as check out that new branch at the same time. Doing a git branch here again will list out the branches in your local repo and place a * before the branch that is checked out.

  master
* some-feature-branch

Do you have to be on master to branch from master?

No. There is a command that allows you to create a new branch from any other branch while having checked out yet another branch.

$ git checkout -b gamma-scalar master

In that example, say you were in branch nesterov-acceleration and you needed to create a new branch and then check out the new branch? By adding master at the end of that command, Git will create a new branch from master and then move you (check out) to that new branch.

This is a nice command, but make sure you understand what you are doing before you do this. Creating bad branches can cause a real headache when trying to merge back into master.

Some files should be ignored

In any Git project, you will find a .gitignore file. This is a simple registry file where you can list the files you DO NOT want to commit to the repo. This would be any files that contain secret information like keys and tokens or any sensitive server configurations.

If a resource has already been added to Git's control, adding it later to the .gitignore file will not ignore that file. The thing to remember here is that the intention is to remove the file from Git control, but not from disk. To do this, use the following command:

$ git rm --cached <filename>

For example, say you had a directory with the filed train_mdl.py and then the file input.dat, but the data file had a lot of secret info and it was accidentally added to the Git version control.

In the .gitignore file, first would add input.dat then in the command prompt, run the following command:

$ git rm --cached input.dat

After running the command, you should see that input.dat is still in the directory, but running the following command, you would see that it is NOT being tracked by Git:

$ git ls-tree -r master --name-only

Branch management

When you am working on a new feature branch, it is a good idea to commit often. This allows you to move forward without fear that if something goes wrong, or you have to back out for some reason, you don't lose too much work. Think of committing like that save button habit you have so well programed into you.

Each commit also tells a little bit about what you just worked on. That's important when other devs on the team are reviewing my code. It's better to have more commits with messages that explain the step versus one large commit that glosses over important details.

Commit your code

As you are creating changes in a project, these are all unseated updates. With each commit there most likely will be additions, and there will also be deletions from time to time. To get a baring of the updates I have made, let's get the status.

$ git status

This command will give you a list of all the updated, added and deleted files.

To add files, you can add them individually or you can add all at once. From the root of the project you can use:

$ git add .

In order to remove deleted files from the version control, you can again either remove individually or from the root address them all like so:

$ git add -u

You can use the following command to address all additions and deletions.

$ git add --all

All the preceding commands will stage the updates for commitment. If you run a git status at this point, you will see my updates presented differently, typically under the heading of Changes to be committed:. At this point, the changes are only staged and not yet committed to the branch. The next step is to commit with a message. To commit, I suggest that you use the following message template:

$ git commit -m "<type>(<scope>): <subject>"

It is considered best to illustrate your comment in the tense that this will do something to the code. It didn't do something in the past and it won't do something in the future. The commit is doing something now.

A bad subject example would be:

$ git commit -m "fixed bug with login feature"

A good subject example would be:

$ git commit -m "update app config to address login bug" 

Push your branch

When working with feature branches on a team, it is typically not appropriate to merge your own code into master. Although this is up to your team as to best manage, the norm is usually to make pull requests. Pull requests require that you push your branch to the remote repo.

To push the new feature branch to the remote repo, simply do the following:

$ git push origin some-feature-branch

As far as Git is concerned, there is no real difference between master and a feature branch. So, all the identical Git features apply.

My branch was rejected?

There is a special case when working on a team and the feature branch being pushed is out of sync with the remote. To address this is pretty simple with the following command:

$ git pull origin some-feature-branch

This will fetch and merge any changes on the remote repo into the local feature branch with all the changes addressing any issues with diffs in the branch's history, now allowing you to push.

Working on remote feature branches

When you are creating the feature branch, this is all pretty simple. But when you need to work on a co-workers branch, there are a few additional steps that I follow.

Tracking remote branches

Your local .git/ directory will, of course, manage all your local branches, but your local repo is not always aware of any remote branches. To see what knowledge your local branch has of the remote branch index, adding the -r flag to git branch will return a list.

$ git branch -r

To keep my local repo 100% in sync with deleted remote branches, you can use of this command:

$ git fetch -p

The -p or --prune flag, after fetching, will remove any remote-tracking branches which no longer exist.

Switching to a new remote feature branch

Doing a git pull or git fetch will update your local repo's index of remote branches. As long as co-workers have pushed their branch, your local repo will have knowledge of that feature branch.

By doing a git branch you will see a list of your local branches. By doing a git branch -r you will see a list of remote branches.

The process of making a remote branch a local branch to work on is easy, simply check out the branch.

$ git checkout new-remote-feature-branch

This command will pull its knowledge of the remote branch and create a local instance to work on.

Keeping current with the master branch

Depending on how long you have been working with your feature branch and how large your dev team is, the master branch of your project may be really out of sync from where you created your feature branch.

When you have completed your update and prior to creating a pull request, you not only have to be up to date in order to merge your new code but also be confident that your code will still work with the latest version of master.

It's here where there are two very different schools of thought. Some teams don't mind if you PULL the latest version from master, by simply doing the following.

$ git pull origin master

This will fetch and merge any changes on the remote repo into the local branch with all the changes, now allowing your feature branch able to be merged. This works for the purpose of merging, but it's kind of gross on the branch's history graph.

Then there are teams who are not a fan of this process, simply because pulling from origin can really screw up the feature branch's history and make it harder to perform more advanced Git features if needed. So, in these situations, it's best to REBASE O_O.

Rebasing a feature branch is not as scary as most make it seem. All a rebase really is, is taking the updates of the feature branch and moving them to a new spot in the history as to be on top of the latest version of master. It's as if you just created that feature branch and made the updates. This creates a very consistent branch history that is easy to follow and easy to work within all situations.

To rebase your local feature branch off of the latest version of master, following these steps will be a guarantee every time.

$ git checkout master         /* ensure you are on the master branch
$ git pull                                   /* pull the latest from the remote 
$ git checkout some-feature-branch      /* checkout the feature branch
$ git push origin som-feature-branch  /* update your copy in the repo
$ git rebase master              /* rebase on the master branch
$ git push origin some-feature-branch --force   /* force update the remote

And that's it. This process will ensure that you have the latest version of master then take the commits from your feature branch, temporarily unset them, move to the newest head of the master branch and then re-commit them. As long as there are no conflicts, there should be no issues.

Force push? But ...

Yes, there are those who are not fans of force pushing, but in the case of a feature branch, this is ok. Now force pushing to master, well, that's a really bad idea.

When performing operations like rebasing you are in effect changing the branch's history. When you do this, you can't simply push to the remote as the histories are in conflict, so you will get rejected. To address this, adding the --force flag to force push tells the remote to ignore that error and replace the previous history with the new one you just created.

Conflicts

In the worst-case scenario, there is a conflict. This would happen even if you did the pull directly from master into the feature branch. To resolve a conflict, read the error report in the command prompt. This will tell you what file has a conflict and where.

When opening the file you will see a deliberate break in the file's content and two parts that are essentially duplicated, but slightly different. This is the conflict. Pick one of the versions and be sure to delete all the conflict syntax injected into the document.

Once the conflict(s) is/are resolved, back in the command line you need to add this correction.

$ git add .

Once the new updates are staged, you can't commit again as this process is inside a previous commit. So you need to continue with the rebase, like so:

$ git rebase --continue

If for any reason you get stuck inside a rebase that you simply can't make sense of, you can abort the rebase;

$ git rebase --abort

As long as there are no other issues, the rebase will continue and then complete with an output of the updates.

The Pull Request

The pull request is where the rubber meets the road. As stated previously, one of the key points of the feature branch workflow is that the developer who wrote the code does not merge the code with master until there has been through a peer review. Leveraging Github's pull request features, once you have completed the feature branch and pushed it to the repo, there will be an option to review the diff and create a pull request.

In essence, a pull request is a notification of the new code in an experience that allows a peer developer to review the individual updates within the context of the update. For example, if the update was on line 18 of train.py, then you will only see train.py and a few lines before and after line 18.

This experience also allows the peer reviewer to place a comment on any line within the update that will be communicated back to the editor of origin. This review experience really allows everyone on the team to be actively involved in each update.

Once the reviewer has approved the editor's updates, the next step is to merge the code. While it used to be preferred to merge locally and push master, Git has really grown into this feature and I would argue today it is most preferred to simply use the GUI took in Github.

Now the question is, how to merge? Github has three options; Create a merge commit Squash and merge Rebase and merge

Creating a merge commit is ok, this will simply merge in the new feature branch code into the master branch as long as there are no conflicts. Github will not allow you to merge if it already knows there will be conflicts.

The squash and merge process is interesting as this will compact all the commits to this feature branch into a single commit to the master branch. This may or may not be an issue depending on how your team want's to preserve history. If you are a user of the Angular commits, you may not want to use this feature.

Lastly, there is the rebase and merge. Showing my preference for rebasing earlier. This will take all the commits of the feature branch and reset them to the latest head of the master branch. If you did a rebase on the feature branch prior to creating a pull request, this process will be seamless and in the end, the most healthy for the project's history.

Shortcuts using aliases

There are some steps in there that we should just be doing all the time. What about making a single command alias that will cycle through all these commands just so we know things are always in good shape?

In Bash

Using Git and Bash is like using a zipper and pants. They just go together. Creating a Bash alias is extremely simple. From your Terminal, enter

$ open ~/.bash_profile

This will open a hidden file in a default text editor. If you have a shortcut command for opening in an editor like Sublime Text or VS Code, use that to open the file. In the file add the following:

alias refresh="git checkout master && git pull && git fetch -p"

With this in your .bash_profile, you simply need to enter refresh in the command line and POW!

Resources