Changesets, Not Snapshots

revisioncontrol git

Mon Feb 18 01:37:00 -0800 2008

It seems that Git fever is going around.

An important concept to keep in mind with decentralized revision control is that the entire paradigm is one of changesets, not snapshots. Subversion and friends take a snapshot of your source tree and save it. Branching and merging is thus quite painful, because it’s hard to tell the intentions of the code authors when all you have is a before snapshot and an after snapshot.

A changeset is a single commit - mostly the patch (diff) of the code, but also some metadata about the patch, such as the parent node in the commit tree. This is what allows git and friends to seem smarter (in fact, nearly omniscent) when it comes to merging. It isn’t about a better merge algorithm; it’s about well-formed data.

Developing in decentralized revision control should lead you to think about your commits in a different way. Each commit should be a single, atomic bundle. If you find yourself thinking “Huh, I should commit - it’s been a while since my last one” - that’s wrong. Your commit will mostly likely contain several changes, and thus not a discrete changeset.

Imagine that each commit were a patch that was going to be emailed to the maintainer of the project, or posted in Trac. If you sent a patch that had two unrelated changes in it, the maintainer would send it back and ask you to separate them into two patches. Think of your commit tree the same way. Each commit should stand on its own merit, isolated from other changes.

git provides tools to keep your commit history well-formed. git stash is one of my favorites. But the crown jewel is git rebase -i HEAD~10. If you haven’t used this yet, try it at your next opportunity. It loads your last 10 commits into a text editor and lets you manipulate them in a freeform fashion.

The “squash” option is particularly nice. If you’ve ever had a commit log that looks like this:

r50 | adam | amazing feature that works perfectly
r51 | adam | oops, it works now
r52 | adam | oops, one other thing
r53 | adam | ok really this time

…you can use squash to combine these all into a single commit.

You’ll understand better how this can be possible when you internalize the fact that commits and pushes are not the same thing. A commit indicates that you have completed a single, atomic change. A push publishes your changes out to others on your team, but is not a commit event in and of itself.