How Git Actually Works

git add Blog

git commit -m “How Git Works under the Hood”

Most of you would be familiar with the above written commands, but do you actually know what happens when you run them? In this blog, we’ll try to find out how Git and other Version Control Systems actually work.

According to Linus Torvalds, the creator of Git, “In many ways you can just see git as a filesystem — it’s content addressable, and it has a notion of versioning, but I really designed it coming at the problem from the viewpoint of a filesystem person (hey, kernels is what I do), and I actually have absolutely zero interest in creating a traditional SCM system.”

Let’s dive deep and look exactly what is this filesystem. After initializing a git repository (git init), a directory .git is created. The structure of a .git directory is as follows:

Most of the fuctionality of Git is based on its objects present in objects subfolder or the Git object database. There are four objects in Git — the blob, the tree, the commit and the tag. The blob, tree and commit objects stores the actual data of the project. These objects reference each other to compose the filesystem.

Just below the branch node, is our first object, the commit. The commit has information on the author, the committer, and the commit message. The commit object also has a reference to a tree. The tree is our second object, and can be understood as a directory. Eventually, the tree will reference a blob, which is our third object. The content of a file (not the file itself) is stored as a Blob in Git. The diagram below depicts these relationships more clearly:

Let’s now take a look at how the objects reference each other, through the SHA-1. SHA stands for “Secure Hash Algorithm”, and is a unique 40 character hash of the content of the object it is named after. The commit object holds a reference to the tree SHA, and the tree object holds a reference to the blob SHA. This reference chain is used to traverse the Git filesystem.

Let’s see an example of how this actually works. To start, let’s take a look at the contents of .git/objects folder of a git repository after creating one file and running a single commit.

The commit creates three sub-directories in the objects folder. Each sub-directory is labeled by the first two characters of the SHA of the folder’s respective object. Looking inside the folders, we find the remaining 38 characters of the SHA. The first object references the Blob, the second object references the Commit, and the third object references the Tree.

To actually view the content of a Git object in the command line, run the command: git cat-file -p [SHA]. Running this command with our “commit”, the SHA shown above will produce the following:

The content of the commit object contains information on the commit and has a reference to the tree object by its SHA. Now let’s take a look at tree object’s content using the tree object’s SHA:

This shows that the tree object contains information on the blob(s) it references. Because our project only has one file, we only see one blob listed in the tree. Likewise blob contains the actual content in the project files at the time of commit.

Git uses the SHA to reference each of the three objects in Git, and can subsequently be used to traverse the Git structure. To dive deeper, let’s create a new file called “text.txt” and make a second commit. Like the first commit, three more objects (the blob, the tree and the commit), will appear in our objects folder. The new tree object that is created, however, will now contain a reference to two blobs rather than one:

The first blob references the same blob from our first commit. This is because The initially committed file was never changed, and Git does not create new blobs for files that have not changed.The second blob contains the content of the new file that was created before our second commit.

This blog has covered the core functionalities of how Git stores the data in objects and how it keeps the history of commits. There is much more to explore in Git and Version Control Systems. You can always refer resources like https://git-scm.com/ to learn more about Git.

                                                                                                              Happy Gitting

– Harsh Agarwal

Leave a Reply

Your email address will not be published. Required fields are marked *