Git Is a Linked Structure You Are Already Building

Textbook: Pro Git (Chacon and Straub)

Where you are: Custom linked lists > A linked structure you already use every day

You are about to spend weeks building linked lists by hand: a node that holds some data and a reference to the next node, a chain you walk one link at a time, ending at null. Here is something nobody tells you early enough: Git, the tool you use every day in this course, is built from exactly that idea. Once you see it, Git stops being a pile of magic commands and becomes a data structure you already know how to reason about.

This page does not teach you Git commands. It shows you what Git is, in the vocabulary of CSCD 211, so that when you run the workflow you understand what is actually happening to the pointers.

A commit is a node

When you make a commit, Git stores a node. That node holds a snapshot of your files and a reference to its parent, the commit that came before it. Sound familiar? It is a Node with a next, except the reference points backward in time, to the past, instead of forward.

A chain of commits is therefore a singly linked list:

C1 <--parent-- C2 <--parent-- C3

C1 is the very first commit. Its parent reference is null, the same way the last node in your linked list has a next of null. The only difference from the list in your lab is the direction the arrows point.

When you run git log, Git starts at the most recent commit and walks parent by parent until it hits the null parent, printing each node. That is a linked-list traversal. You will write that exact loop later in this lane: start at a node, follow the reference, stop at null.

A branch is a named pointer

Here is the part that surprises people. A branch is not a copy of your code. A branch is a single pointer to one commit, with a name on it. That is all main is: a reference that holds the address of one node.

main ---> C3 feature ---> C2

Creating a branch does not copy anything. It makes one more reference to a node that already exists. That is why branching is instant no matter how big your project is: it is the same cost as writing Node other = existing; in Java. One pointer assignment. O(1). Aliasing, not copying. This is the exact idea behind the defensive-copy discussion in your composition work: two references can name the same object, and Git uses that on purpose.

When you commit on a branch, Git creates a new node whose parent is the commit the branch pointed at, then moves the branch pointer forward to the new node. Create node, relink reference, advance pointer. You are doing the same three steps when you addLast to a linked list.

HEAD is a pointer to a pointer

Git needs to know which branch you are currently on. It keeps a reference called HEAD that points at your current branch, which in turn points at a commit. So HEAD is a pointer to a pointer, the same shape as a Node reference that refers to another reference. Switching branches (git switch) does not touch your commits at all. It just reassigns HEAD to point at a different branch pointer. Moving a reference, not moving data.

A merge makes a node with two parents

A linked list has one parent per node. Git allows one more thing: a merge commit is a node with two parent references, one for each branch you merged.

C3 <--- C5 ---> C4 | (merge: two parents)

The moment a node can have more than one parent, your history is no longer a simple linked list. It is a directed acyclic graph, a DAG: nodes with references, no cycles, more than one link allowed per node. That is the next structure up from the linked list, and you will meet graphs and trees later in your degree. Git is a concrete, daily example of one. Every pull request you merge adds a two-parent node to that graph.

Why this changes how you work

Seeing the structure explains the rules that otherwise feel arbitrary:

Branching is cheap because it is one pointer, so make a branch for every task without a second thought. The course asks you to, and now you know why it costs nothing.
You cannot lose committed work by switching branches, because switching only moves HEAD. Your commits are still nodes in the graph; nothing was deleted, a reference just moved. This is the same reason an object in Java survives as long as something still references it.
Merging does not blend your files line by line into one timeline. It creates a node that points back to both histories. The graph remembers where each line of work came from.

What You Need To Walk In With

A commit is a node holding a snapshot and a reference to its parent; a chain of commits is a singly linked list, walked the way you walk a list in lab.
A branch is a named pointer to one commit. Creating one is a single pointer assignment, O(1), not a copy.
HEAD is a pointer to your current branch, so switching branches just reassigns a reference.
A merge commit has two parents, which makes the full history a directed acyclic graph, the structure one step beyond a linked list.

Check Yourself

Close the notes and answer each one from memory, then reveal it. Pulling an idea back from memory is one of the strongest ways to make it stick.

Check your understanding

When you make a commit, what reference does that commit node store about history?

Tier 1 · Pro Git, 10.2 Git Internals: Git Objects

What does a Git branch actually store?

Tier 2 · Pro Git, 10.3 Git Internals: Git References

Creating a new branch in a large repository is fast because it is essentially:

Tier 2 · Pro Git, 3.1 Git Branching: Branches in a Nutshell

A merge commit has two parents. What does that make the overall commit history?

Tier 3 · Pro Git, 3.2 Git Branching: Basic Branching and Merging

Go Deeper (optional)

Run git log --oneline --graph on a repository with a merge in it. You are looking at the DAG drawn for you: the nodes, the parent links, the two-parent merge. Trace it by hand the way you would trace a linked list in lab.
The linked-list work in this lane and this page are the same idea wearing different clothes. If pointers and references ever feel abstract in lab, come back here; if Git ever feels like magic, go build the list. Each one explains the other.
A real question to sit with, not to look up: your linked list can be traversed forward from the head, but Git walks commits backward from the newest to the oldest. Why does Git point at the parent (the past) instead of at the child (the future)? What would break if a commit had to know its children? The answer is the same reason your Node points one direction and not both.