Only Git Commits are Real

The terminology in this post represents how I've come to think about certain Git data structures, after using Git for close to 6 years. It does not reflect the official terminology used in the Git manual, and it does not reflect the Git implementation internals (of which I remain blissfully unaware).

Git branches (and repositories, remotes, pull requests, etc) are not real. At least not in the way that a commit is.

What makes a commit real?

A commit is more real than these other objects in the git world for a couple of reasons:

To understand the practical implications of these properties, it's useful to understand that a commit has a "commit hash." You've surely seen a commit hash if you've worked with Git before. You can get the hash of the current commit (the commit at HEAD) with:

git rev-parse HEAD

A commit hash uniquely identifies a commit. (In fact, the commit hash is a hash of all of the metadata in the commit, and all the files at the moment of the commit. You can't change the files or the metadata without breaking this hash.) If you have the commit hash, you can find the commit and switch to it with git switch --detach <hash> or you can read the commit information with git show <hash>.

There are obviously operations in Git that appear to violate the above "immutable" property. At a basic level, git commit --amend will let you edit a commit, and at a more advanced level, git rebase is a tool for moving commits.

However these commands are really generating new commits, based on the old ones. These new commits will have new hashes (since their data will be different) and the old commits will still exist and still be usable if you have their hashes. Here's a basic example demonstrating this:

$ git commit -m "Exampel commit" # commit with misspelling
[master 8fd3e68] Exampel commit
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 README
$ git rev-parse HEAD
8fd3e68b316f89357cb2bec8fb783d78c6ba6f9b
$ git commit --amend -m "Example commit" # fix spelling
[master fae17a7] Example commit
 Date: Wed Jan 3 08:47:41 2024 -0500
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 README
$ git rev-parse HEAD
fae17a7e41683cf4789f9ac5a7a889147a5ec2a0
$ git show 8fd3e68 --format=oneline -s # show the original unchanged commit
8fd3e68b316f89357cb2bec8fb783d78c6ba6f9b Exampel commit

What makes a Git branch not-real?

Obviously, Git branches do exist. To understand what I mean when I say that they're not as real as commits, I'd like you consider a ski slope.

A map of Bald Mountain from the 2023 Deer Valley Press Kit
2023 Deer Valley Press Kit

This is a map of Bald Mountain. The runs, or suggested routes down the mountain, are denoted with solid lines. If you squint, it looks kind of like a Git commit tree (or some sort of directed acyclic graph).

These routes, however, are not etched into the mountain in a definitive fashion. In fact, here's an image of the mountain.

An image of Bald Mountain from the 2023 Deer Valley Press Kit
2023 Deer Valley Press Kit

You can see the general shape of the runs, where the resort has cleared trees, but without the map you can't tell the exact route of the runs. When skiing, you're allowed to cut through thin sections of trees or zig-zag between different sections of snow. This is why I'm using a ski slope, and not a road or train track, as a metaphor. A train track exists in the real world and the train has to stay on it. There are no such tracks when skiing.

Git branches are the same way. If commits are the actual mountain, then branches are just the lines on the map.

Normally, when using Git, you stick to the branches, and your branches and your commits evolve at the same time. For example, when you make a commit, your current branch updates to point to the new commit.

$ git branch -v
* master fae17a7 Example commit
$ git commit -m "Test commit"
[master 8ea77b2] Test commit
$ git branch -v
* master 8ea77b2 Test commit

But this is not necessarily the case. We can move our branch back to the previous commit.

$ git reset --hard HEAD~ # Reset the current branch to the parent of the HEAD commit
HEAD is now at fae17a7 Example commit
$ git branch -v
* master fae17a7 Example commit
$ git show 8ea77b2 --format=oneline -s
8ea77b29c0bc56c612a0668a00b929f9d11f45fc Test commit 

As can be seen in the example, this hasn't deleted the commit. In fact, this operation hasn't changed "the mountain" at all. It's only moved the "line on the map." The commit is still there.

In fact, if we first switch to a different branch, we can delete the branch entirely, and this doesn't effect the commits.

$ git switch -c feature-branch
Switched to a new branch 'feature-branch'
$ git branch -D master
Deleted branch master (was fae17a7).

All of our commits are still there. However, since they're not all on a branch, some of them are unreachable!

What does it mean to be "on a branch"

The "on a branch" terminology is very frequently used in the Git world. This terminology confused me when I was learning Git, and it's doesn't reflect how I think about Git commits or branches today.

Saying "this commit is on the branch master" reflects a lot more about the location of the branch than it does the location of the commit. The branch is movable; the commit is not. In the ski-slope metaphor, you could say "this spot is on the run Evergreen," but that doesn't mean the spot could move to a different run.

Additionally, this phrase is confusing because a branch does not have a list of commits which are "on it." Rather, a branch stores only the single commit which is at its tip. In this sense, branches are less like runs on a ski slope, and more like points placed on the map.

Here's the best definition for this concept that I can give: a commit is "on a branch" if and only if that commit is an ancestor of the commit at the tip of the branch.

One interesting consequence of this definition is that once two branches have been merged, they have exactly the same set of commits, and it's impossible (without looking at commit messages or PR messages) to tell which commits were originally on which branches.

What happens to commits that aren't on a branch?

Commits that aren't on a branch are called "unreachable." This doesn't mean that they're completely unreachable—as we showed, if you have the commit id, you can still inspect or switch to these commits. However, they're not reachable from a branch (or other reference like a tag).

Unreachable commits include commits that have been edited with git commit --amend, commits that have been copied and superceeded by a git rebase, or commits that were left behind by branch-moving or deletion shenanigans. They exist in a sort of purgatory. They normally exist completely out of sight. And then, after 90 days, they are deleted forever by Git's garbage collection. However, they can be saved if a kind Git user (like you) makes them reachable again.

We can view the state of our Git tree with git log. However, if we want to include unreachable commits, we have to pass their commit hashes so that git log knows to look for them.

$ git log --graph 8ea77b2 8fd3e68
* commit 8ea77b29c0bc56c612a0668a00b929f9d11f45fc
| Author: Matthias Portzel <MatthiasPortzel@gmail.com>
| Date:   Sat Jan 6 14:52:08 2024 -0500
|
|     Test commit
|
* commit fae17a7e41683cf4789f9ac5a7a889147a5ec2a0 (HEAD -> feature-branch)
| Author: Matthias Portzel <MatthiasPortzel@gmail.com>
| Date:   Wed Jan 3 08:47:41 2024 -0500
|
|     Example commit
|
| * commit 8fd3e68b316f89357cb2bec8fb783d78c6ba6f9b
|/  Author: Matthias Portzel <MatthiasPortzel@gmail.com>
|   Date:   Wed Jan 3 08:47:41 2024 -0500
|
|       Exampel commit
|
* commit 530fa49f63df2382c26e5c3a7f13530568d38d12
| Author: Matthias Portzel <MatthiasPortzel@gmail.com>
| Date:   Wed Jan 3 08:44:44 2024 -0500
|
|     Example commit 1
|
* commit 1160ba1470ab0496293a1db0e7abee4c74863337
  Author: Matthias Portzel <MatthiasPortzel@gmail.com>
  Date:   Wed Jan 3 08:44:33 2024 -0500

      Test commit 1

In this commit tree, you can see two unreachable commits, 8ea77b2 and 8fd3e68. fae17a7 and its three generations of parents are reachable because they exist on the feature-branch branch.

Let's finish this post by "saving" commit 8ea77b2 from its unreachable state. We just have to create a new branch at that commit.

$ git branch master 8ea77b2

Simple.

Hopefully this post gave you a better understanding of how to think about commits and branches; and hopefully this framework makes it easier to understand potentially confusing Git operations like rebase, merge, and more.

Background from Hero Patterns; CC BY 4.0