← Skill tree CS Skill Tree 0 CSCD211

Git Is a Linked Structure You Are Already Building

Textbook: Pro Git (Chacon and Straub)

Where you are: Custom linked lists > A linked structure you already use every day

You are about to spend weeks building linked lists by hand: a node that holds some data and a reference to the next node, a chain you walk one link at a time, ending at null. Here is something nobody tells you early enough: Git, the tool you use every day in this course, is built from exactly that idea. Once you see it, Git stops being a pile of magic commands and becomes a data structure you already know how to reason about.

This page does not teach you Git commands. It shows you what Git is, in the vocabulary of CSCD 211, so that when you run the workflow you understand what is actually happening to the pointers.

A commit is a node

When you make a commit, Git stores a node. That node holds a snapshot of your files and a reference to its parent, the commit that came before it. Sound familiar? It is a Node with a next, except the reference points backward in time, to the past, instead of forward.

A chain of commits is therefore a singly linked list:

C1 <--parent-- C2 <--parent-- C3

C1 is the very first commit. Its parent reference is null, the same way the last node in your linked list has a next of null. The only difference from the list in your lab is the direction the arrows point.

When you run git log, Git starts at the most recent commit and walks parent by parent until it hits the null parent, printing each node. That is a linked-list traversal. You will write that exact loop later in this lane: start at a node, follow the reference, stop at null.

A branch is a named pointer

Here is the part that surprises people. A branch is not a copy of your code. A branch is a single pointer to one commit, with a name on it. That is all main is: a reference that holds the address of one node.

main ---> C3 feature ---> C2

Creating a branch does not copy anything. It makes one more reference to a node that already exists. That is why branching is instant no matter how big your project is: it is the same cost as writing Node other = existing; in Java. One pointer assignment. O(1). Aliasing, not copying. This is the exact idea behind the defensive-copy discussion in your composition work: two references can name the same object, and Git uses that on purpose.

When you commit on a branch, Git creates a new node whose parent is the commit the branch pointed at, then moves the branch pointer forward to the new node. Create node, relink reference, advance pointer. You are doing the same three steps when you addLast to a linked list.

HEAD is a pointer to a pointer

Git needs to know which branch you are currently on. It keeps a reference called HEAD that points at your current branch, which in turn points at a commit. So HEAD is a pointer to a pointer, the same shape as a Node reference that refers to another reference. Switching branches (git switch) does not touch your commits at all. It just reassigns HEAD to point at a different branch pointer. Moving a reference, not moving data.

A merge makes a node with two parents

A linked list has one parent per node. Git allows one more thing: a merge commit is a node with two parent references, one for each branch you merged.

C3 <--- C5 ---> C4 | (merge: two parents)

The moment a node can have more than one parent, your history is no longer a simple linked list. It is a directed acyclic graph, a DAG: nodes with references, no cycles, more than one link allowed per node. That is the next structure up from the linked list, and you will meet graphs and trees later in your degree. Git is a concrete, daily example of one. Every pull request you merge adds a two-parent node to that graph.

Why this changes how you work

Seeing the structure explains the rules that otherwise feel arbitrary:

What You Need To Walk In With

Check Yourself

Close the notes and answer each one from memory, then reveal it. Pulling an idea back from memory is one of the strongest ways to make it stick.

Check your understanding

When you make a commit, what reference does that commit node store about history?

Tier 1 · Pro Git, 10.2 Git Internals: Git Objects

What does a Git branch actually store?

Tier 2 · Pro Git, 10.3 Git Internals: Git References

Creating a new branch in a large repository is fast because it is essentially:

Tier 2 · Pro Git, 3.1 Git Branching: Branches in a Nutshell

A merge commit has two parents. What does that make the overall commit history?

Tier 3 · Pro Git, 3.2 Git Branching: Basic Branching and Merging

Go Deeper (optional)