Making the perfect sausage

I am a big fan of hiding the sausage making with git. This is the practice of updating and editing commits to make them look as if they sprang into existence perfectly formed. It involves going back and rewriting git history to ensure that each entry is complete and coherent. Here is a quick run through of how I work in order to make it easy to juggle and rework my commits.

Start on a new branch

The first thing you'll want to do is create a new branch for your change. I find it useful to have a single level of folders in my branch names to separate the different types of changes.

# For a bug fix
> git checkout -b bug/1234
    
# For a new story
> git checkout -b story/my-story
    
# For a system-wide enhancement
> git checkout -b feature/fancy-upgrade

I find that more than one level gets a bit confusing but it's really just whatever works for you.

Do some work

Start making the change. Commit as often as you can. At this stage you want to keep your commits really small. Most commits I add at this stage consist of single small change to a file - for example adding or removing one method - and a change to an associated test file. They usually have single line commit messages. Tackle each change as a single small isolated piece of work. Try and keep moving quickly and fluidly, maintaing as much momentum as possible. If you realise you need to amend lines you have already added, just add the changes as a new commit.

Take a breather and review

After a while you will naturally reach some sort of break point. This may be the completion of one step in a process or you may discover that you need to go and fix a bug before you can continue. Most probably it's just time for a cup of tea. Before you carry on, have a quick review of the commits you have made so far. See if the changes you have made can be grouped together. A handy command for this is

> git log --oneline master..

This will show you the first line of each commit message since you branched from master.

We can now re-arrange these commits in order to make it look like we did every job neatly and perfectly first time using git's interactive rebase feature.

A quick detour to discuss rebasing

In case you are not familiar with it, rebasing means taking a group of commits and placing them on top of a different parent. If you think of your entire repository as a tree, it is the equivalent of breaking off a twig or branch and placing it somewhere else on the tree. It works by rewinding all of your commits and replaying them on top of the new parent. This is very useful when working as part of team as it provides an alternative way of combining changes. As an example, imagine that you and a colleague both created new branches from your master branch at the same time. You both do some work. They get their changes finished first and merge them into master. In the process they have added a juicy new feature that would be perfect for you to use in your changes but of course it is only available in master, not in your branch, because you created your branch before the feature was added. One way to fix this is to rebase your branch on top of the latest version of master. This will then rewind your branch to the point where you diverged from master, fast-forward to the latest version of master and replay your commits on top.

Git's interactive rebase feature makes this even more useful as it allows you to fiddle with your commits as you replay them. You can re-order the commits, squash multiple commits into a single commit or even rework a commit completely by adding or removing changes from the commit.

Dealing with conflicts

Before we look at an example, it's important to note that interactive rebasing also places your commits on top of the current version of master. This may have changed since you created your branch. As a result it is possible to have conflicts if you have changed files that have also been changed on master. The only way to deal with these is to go through them one by one but it is work that will have to be done at some point - if they cause conflicts here they would have caused conflicts when merging had you not rebased. The advantage of doing this often is that you usually end up with only one or two small changes to deal with at a time.

With that in mind, I usually do an initial rebase against the latest version of master before doing any rewriting. To do this, I fetch the latest master from my remote repository and rebase against that

# fetch everyone else's updates from remote
> git fetch
    
# you can do the rebase the long way
> git checkout master
> git pull
> git checkout my/branch
> git rebase -i master
    
# but there is a nice shortcut
> git rebase -i origin/master

# the key difference is that
# we have not updated our local
# version of master

We will see a list of all the commits added since we branched from master. It will look something like this:

pick 84133cb Added action to the controller
pick 5f644c7 Added a view for the new action
pick 83603d3 Fixed typo in the controller action

It is always worth a quick look through this list and check that it is what we expect. If we have made a mistake somewhere and the wrong commits are here, spotting it here can save us a world of trouble later. An example of where this could happen would be if we typed in the wrong branch name above which can happen when using branches created from other branches. If there are unexpected commits in the history, simply delete the entire line. This won't delete the commit, just remove it from this branch. If for some reason you feel you need to abort straight away, you need to be careful how you exit the editor. If you just quit your editor, the rebase will continue. Assuming you are using a terminal editor (vim, emacs, nano etc) you can pause the editor and then kill the job. To pause the editor, use CTRL+z. You will see a message that starts something like this

[1]  + 1234 suspended

The number in brackets is the job number. You can then kill the job with

> kill -9 %1

Don't forget the % !

We will then need to abort the rebase with git using

> git rebase --abort

If we encounter conflicts we will need to resolve them and then add the files. Once all the conflicts for a commit have been dealt with, continue the rebase with

> git rebase --continue

Group changes to similar files

Once any conflicts are dealt with, we can run the same git rebase -i master command again. This time we focus on grouping together similar commits or commits to the same part of a file - for instance reworkings of the same method. To move a commit in the history, simply cut and paste the entire line to where you want it. So you might change the history to look like

pick 84133cb Added action to the controller
pick 83603d3 Fixed typo in the controller action
pick 5f644c7 Added a view for the new action

When you save and close your editor, git will attempt to reorder the commits.

This is where keeping your commits small to begin with will pay off. If you try and do a move that git cannot resolve automatically, you will get a conflict. The more changes you have in a commit, the greater the chance of conflicts with other commits when you try to move it. If you have a conflict, it is possible to resolve it. However by keeping my commits small, usually if I have a conflict, it means that I have made a mistake and moved a commit inappropriately. An example of where this might happen is if I try and move a change to a file before the commit that created it. Clearly this is a mistake and so I can just abort the rebase, again using

> git rebase --abort

Merge small changes

Once you have the changes grouped together, you can merge commits together using the squash or fixup commands. After we reordered the commits, next time we run the git rebase -i master command, our history looked something like this

pick 84133cb Added action to the controller
pick 83603d3 Fixed typo in the controller action
pick 5f644c7 Added a view for the new action

We can replace the word pick with an alternative command. The list of commands is displayed at the bottom of the rebase screen. squash and fixup both merge commits, the difference is that squash pauses for you to update the commit message for the combined commits. For fixup the earliest commit message will used and the rest discarded. All the commands can be abbreviated to just their first initial character.

We can merge the typo fix into the original commit by changing the second line so that we have

pick 84133cb Added action to the controller
f 83603d3 Fixed typo in the controller action
pick 5f644c7 Added a view for the new action

When we save and close the editor, git will work its magic and we will end up with just 2 commits, 1 to add the action (including the typo fix), another to add the view. No one ever needs to know about the typo ;)

Sometimes you will have an idea of what your final set of commits will look like. In this case you can start setting up these commits and merge your small changes together to start forming these commits. Other times, you may have made repeated changes to the same lines of code. For example you have started with a simple test case for a method and then have subsequent commits for building up more complex behaviour. These could all be merged together to make it look as if you created the method perfectly first time.

It is also worth keeping an eye on the commit messages. I tend to build these up as I go in the same way, to begin with they are single line messages. Then as I merge commits together, the individual lines form bullet points and I am a new heading that better summarises the new commit. As the commit develops, I add in the additional information, usually embellishing the messages as I go. It is worth noting that you don't want your commit messages to become bullet points of the changes that were made. Including the fact that you changed a particular method is not that useful since that is clear from looking at the contents of the commit. What is more useful is why this change was necessary and what we expect the result to be.

I also find it useful to only do a few simple operations at a time. Once you see the power of rebasing it is tempting to try do one massive rebase that is moving, squashing and deleting commits at the same time. I find that this usually ends in tears and that it is worth the few extra minutes to get each step right first time.

Carry on coding

You can now carry on with the next part of the job. At your next break point (aka cup of tea) repeat the above process but this time, you will already have some larger commits from your previous changes. You can merge your new set of small changes into these large commits if it's appropriate to do so. Eventually you will reach the end of the job with a set of commits that look like they were written prefectly from scratch.

Play nice with others

One important point to note is that all this rewriting of history means you need to take some care when working with remote repositories. When you edit a commit, you are really creating a new commit. The old commit still exists but is no longer included in the branch. When git attempts to merge your changes with changes by other developers, if the other set of changes contains commits that you have since rebased out of existence, git may not recognise them as the same set of changes and attempt to apply one on top of the other. This can have unexpected, and sometimes unpleasant, results. There are a few guidelines that can help to avoid these kind of problems

Most important is communication with the rest of your team. Since you will need to force push to update remote branches, make sure that everyone is ok with that and you won't be destroying anyone's work.
Work on your own branch whenever possible. Even when collaborating with other developers, creating branches from other branches means that you can subdivide tasks and merge the results together.
Never rebase master. It's just not worth it. Once you have merged your branch, if you find a problem it's better to take the shame and add an additional commit to master.
Always use git pull --ff-only when updating from a remote branch. This means that if someone has rebased and forgotten to tell you, the merge part of the pull will abort and you can sort out the problem. Incidentally, if you need to get the latest version of a branch that has been rebased, often the easiest way is to delete your local copy of the branch and checkout a fresh copy from remote.
Never use git pull --rebase. If someone has rebased the branch, git will try and replay your changes on top of the complete set of remote changes which can have some unexpected results.

Well that's about it for this little run through. Hopefully this has given you some ideas into how to use some of the more powerful features git has to offer and a few tips on staying safe in the process. Happy sausage making!

Making the perfect sausage01Mar14