Maintaining a clean Git history

11 October 2015

Ensuring the git history of your project (or any version controlled project) can be an incredibly useful asset. In particular for new developers, trying to understand why code has been written in a certain way. With clear and ordered commit messages, it can become obvious as to why a change was made.

There are two practices I apply to make sure that I leave my code changes as clear as I can. They are how I write the commit messages, and the ordering of the actual commits.

Commit messages

The message in a commit is where an explanation for the change should reside. This should include at the very least what was changed, and why.

Some examples of a bad commit messages include (and yes, I have seen these kind of commits from multiple developers on different projects):

fixed bug

and

refactored stuff

A more descriptive commit message would look something like

Fix ImageProcessingFactory not handling images less than 50px wide

Commit messages should ideally tell you what a commit will do if it is applied. I try to write my messages as if being read as This commit will <commit message>

An even better commit message would have a slightly longer message, after the initial message, describing the problem/change in more detail. However, this isn't always practical or even necessary depending on the project. If you have a chance I would recommend taking a look at the git and linux kernal commit messages as examples of great ones!

I've also worked in places that would prepend the ticket number to the commit message. Allowing developers to view the reasons for a change in more detail.

Autosquashing commits

Rebasing commit allows you to keep your history linear. While not everyone agrees whether or not you should be maintaining a linear history, It is almost definitely (in my opinion) and good idea to keep your local branches ordered appropriately, by showing changes as a series of changes to be applied. Ideally each change is isolated from the others (within reason), and can be applied separately.

Development rarely works like this. Too often have I made some changes, progressed a bit further, and realised I there were more changes to make in that original commit. Maybe it was a left over debug statement, or some missing config, or even a chunk of functionality I didn't originally think about. Typically most people would just make a new commit with these changes, and leave it at that.

Git has the ability to squash commits into another, allowing you to make several small changes, and then bundle them up into one, more appropiate, commit. This get a bit complicated, when you start to make changes that aren't in order.

Luckily, git has that solved, with --fixup and --squash. These let you commit changes, and at the same time specify that they should be part of a previous commit. The fixup option will allow you to amend a previous commit, and not change the original commit message. The squash options, will also amend a previous commit, but it will also give you the change to alter the commit message.

To use these, just prepare you staging area, like normal (git add or git add -p), then commit with:

`git commit --fixup=<previous commit hash>`

This will create a new commit, with a message in the form

!fixup <original commit message>

Then when you are at a stage you wish to squash and clean all your commits, just run a rebase with the --autosquash flag (or you can set rebase.autosquash to true in you git config).

You history would then go from:

commit 1234 !fixup Original change

commit 5678 Some other change

commit 91011 Original change

becoming:

commit 5678 Some other change

commit 91011 Original change

Where commit 91011, contains all the changes from the original 91011, and 1234

Rebasing commits can sometimes be a little scary if you don't know what you are doing or how it really works. So squashing commits may not be for everyone, but I would still recommend everyone to put in more effort to their commit messages, as it can help everyone involved in modifying a codebase.