Systems

Build Your Own Git

A TypeScript implementation of git's core - because the best way to understand a tool is to build it yourself.

owaish

April 10, 2026

18 min read

Build Your Own Git¶

I use git every single day. git add, git commit, git push - it’s muscle memory at this point. But if you asked me six months ago what actually happens when I type git commit, I would have given you some handwavy answer about “snapshots” and “history.” That felt wrong. I hate not knowing how the tools I rely on actually work.

So I decided to build git myself. In TypeScript.

Not to replace git - obviously. Just to understand it. And honestly? Building it taught me more in a week than years of using it ever did.

This is the story of how I built mygit.

First, the Thinking¶

Before writing a single line of code I spent time just… thinking. What problem does git actually solve?

Here it is, as plainly as I can put it: you want to be able to go back to how your files looked at any point in the past.

That’s it. Branches, remotes, merges - all of that is built on top of this one core idea. If you can solve that problem, you understand git.

So how do you solve it?

The Commit Problem¶

A “version” is really just a snapshot - a record of what every tracked file looked like at a specific moment. Each snapshot needs to know what files looked like, when it was taken, why it was taken, and what snapshot came before it.

That last part is the interesting one. Snapshots form a chain. My first instinct was a linked list - each commit node has a prev pointer, classic data structures stuff. But I immediately hit a wall.

Every mygit command is a separate Node.js process. mygit add and mygit commit share zero memory. Everything has to live on disk. And here’s where the linked list idea dies: JSON.stringify cannot serialize a doubly linked list. Circular references. It just explodes.

So what does real git actually do? Each commit is its own file, stored in .git/objects/, named by its SHA-1 hash. Each commit file holds a parentHash field pointing to the previous commit’s filename. To walk history, you load the latest commit, read its parentHash, load that file, repeat until parentHash is null.

mygit.json            objects/
{ HEAD: "7c8b1a" }    ├── 7c8b1a...  { message: "second", parentHash: "9d4e2f" }
                      └── 9d4e2f...  { message: "first",  parentHash: null     }

No circular references. No reconstruction headaches. Clean. I loved this once I understood it.

Storing File Contents¶

The naive approach - store raw file contents inside each commit - falls apart immediately. Change one file, and now every commit duplicates everything else. Wasteful.

The fix: separate file contents from commit metadata. Hash each file with SHA-1, compress it with gzip, store it once in .mygit/objects/. The commit just holds the hash. If README.md didn’t change between commit 1 and commit 2, both commits reference the exact same compressed blob. You store it once, reference it many times.

This is content-addressable storage, and it’s genuinely elegant.

Folder Structure¶

Files don’t float in a void - they live in folders. The commit needs to know that utils.ts lives inside src/, not at the root. A flat list of files doesn’t carry that information.

The natural structure here is a tree. File leaves, folder nodes, root at the top:

root/
├── src/
│   ├── index.ts   ← hash: "a1b2c3..."
│   └── utils.ts   ← hash: "f6g7h8..."
└── README.md      ← hash: "9d4e2f..."

Each file leaf holds a hash pointing to its compressed contents in objects/. Each commit holds one of these trees.

Building It¶

Phase 1: The CLI¶

Before anything else, I needed a CLI that could receive commands. This part is surprisingly simple once you understand what’s happening.

When you run node index.js commit, Node exposes the full command as process.argv. The first two entries are always node and the script path, so process.argv[2] is your actual command.

#!/usr/bin/env node

const command = process.argv[2];

switch (command) {
  case "init":   init();   break;
  case "add":    add();    break;
  case "commit": commit(); break;
  default:
    console.log("Unknown command.");
}

Add a bin entry in package.json, run npm run build && npm link, and mygit is now a globally available command on your machine. That first mygit check printing something to the terminal felt weirdly satisfying.

Phase 2: `mygit init`¶

init does four things:

Check the current directory isn’t already a repo
Create a .mygit/ folder
Write an initial mygit.json with { "HEAD": null }
Done

function init() {
  const gitPath = getMygitPath(); // joins process.cwd() with ".mygit"

  try {
    fs.mkdirSync(gitPath);
    fs.writeFileSync(
      path.join(gitPath, "mygit.json"),
      JSON.stringify({ HEAD: null }, null, 2)
    );
    console.log(`Initialized empty MyGit repository in ${gitPath}`);
  } catch (err) {
    if (err.code === "EEXIST") {
      console.error("Repository already exists.");
    }
  }
}

One thing worth noting: instead of manually checking if .mygit exists before creating it, I just try to create it and let the OS tell me via EEXIST if it already exists. Cleaner than a redundant pre-check.

After mygit init:

.mygit/
└── mygit.json   ← { "HEAD": null }

Phase 3: `mygit add`¶

This is where things get interesting. When you run mygit add src/index.ts, the pipeline is:

Validate the repo exists
Validate the file exists
Read the file contents
SHA-1 hash the contents
Gzip compress the contents
Write the compressed blob to .mygit/objects/<hash>
Append an entry to stage.json

const content = await fs.readFile(file);
const hash = createHash("sha1").update(content).digest("hex");

const gzipAsync = promisify(gzip);
const compressed = await gzipAsync(content);

await fs.writeFile(join(objectsPath, hash), compressed);

staging.push({
  fileName: file,
  path: join(objectsPath, hash),
  hash,
  dateTime: Date.now(),
});

await fs.writeFile(stagePath, JSON.stringify(staging, null, 2));

The staging area is just a stage.json file - an array of entries, loaded and saved on every mygit add. Simple, readable, debuggable.

After staging two files:

.mygit/
├── objects/
│   ├── a1b2c3...   ← compressed blob of index.ts
│   └── f6g7h8...   ← compressed blob of utils.ts
├── mygit.json
└── stage.json

stage.json looks like:

[
  {
    "fileName": "src/index.ts",
    "path": "/your/project/.mygit/objects/a1b2c3...",
    "hash": "a1b2c3...",
    "dateTime": 1704067200000
  }
]

Phase 4: `mygit commit`¶

This is the one that brings everything together. The sequence:

Load stage.json - bail early if it’s empty or missing
Load mygit.json to get current HEAD
If HEAD exists, load that commit file to get the previous file tree
Build a new tree from previous tree + staged files
Create the commit object
Hash the commit object - that hash becomes its filename
Save the commit to objects/
Update mygit.json to point HEAD at the new hash
Delete stage.json

The tree-building function is the most interesting part. It walks each staged file’s path (src/repository/commit.ts → ["src", "repository", "commit.ts"]), creating folder nodes as needed, and places each file as a leaf with its hash. Files from the previous commit that weren’t staged this time carry forward unchanged.

const newCommit: Commit = {
  message: command[2],
  dateTime: new Date().toISOString(),
  parentHash: state.HEAD,       // points to previous commit
  fileTreeRoot: buildTree(...), // full folder snapshot
};

const commitHash = createHash("sha1")
  .update(JSON.stringify(newCommit))
  .digest("hex");

await fs.writeFile(join(objectsPath, commitHash), JSON.stringify(newCommit, null, 2));
await fs.writeFile(mygitJsonPath, JSON.stringify({ HEAD: commitHash }, null, 2));
await fs.unlink(stagePath); // clear staging area

After two commits, the folder looks like:

.mygit/
├── objects/
│   ├── a1b2c3...   ← blob
│   ├── f6g7h8...   ← blob
│   ├── 9d4e2f...   ← first commit
│   └── 7c8b1a...   ← second commit
└── mygit.json      ← { "HEAD": "7c8b1a..." }

Phase 5: `mygit log`¶

The simplest command. Pure reads, no writes. Load HEAD, load that commit, push to array, follow parentHash, repeat. Print everything in reverse chronological order.

let currentHash: string | null = state.HEAD;

while (currentHash) {
  const data = await fs.readFile(join(objectsPath, currentHash), "utf-8");
  const commit = JSON.parse(data) as Commit;
  commits.push({ hash: currentHash, commit });
  currentHash = commit.parentHash;
}

Output:

  commit 7c8b1a3f9e2d4b6a1c8f5e3d7b9a2c4e6f8d1a3b
  Date:   Fri, Mar 07, 2026 at 02:30 PM

      second commit

  ────────────────────────────────────────────────

  commit 9d4e2f1a8b3c7d5e2f4a6c8d1b3e5f7a9c2d4e6f
  Date:   Fri, Mar 07, 2026 at 01:15 PM

      first commit

  ────────────────────────────────────────────────

What I Didn’t Build¶

This was always a scoped project. There’s a lot of real git I deliberately left out:

Branches - a branch is just a named pointer to a commit hash. Almost identical to HEAD. Not much work given what’s already there.
Checkout - walk a commit’s file tree, decompress each blob from objects/, write files back to their original paths.
Diff - compare two trees node by node. Same hash → no change. Different hash → changed. Present in one but not the other → added or deleted.
.mygitignore - a pattern list to skip during mygit add.
Remotes - genuinely complex. This is where “transfer objects between two objects/ folders” meets “figure out what each side already has.” A real project in itself.

All of these build directly on what’s already here. The foundation holds.

What I Actually Learned¶

The biggest thing wasn’t git-specific. It was this: the right data structure depends on your storage medium, not on what feels natural in code.

A linked list feels obvious for commit history when you’re thinking in memory. On disk it’s a nightmare - circular reference issues, class reconstruction on deserialization, one giant growing blob. The hash pointer approach feels less obvious at first, but it maps perfectly to how files work. Each commit is independent. Loading is just reading one file. No reconstruction.

Real git arrived at this design for the exact same reasons. The design isn’t arbitrary - it’s the natural solution to the constraints of the problem. Building it yourself is how you see why.

The other thing: reading about how git stores objects is fine. Building an object store yourself is something else. You don’t just know how git works now - you know why it works the way it does. That’s the part that sticks.

Try It Yourself¶

If you want to build this yourself, I’d genuinely recommend doing it from scratch rather than copying. Start with the thinking - what problem is a commit actually solving? What constraints does disk serialization impose? How do you represent a folder hierarchy?

Once you’ve sat with those questions, the implementation almost writes itself.

The full source for mygit is on my GitHub: github.com/owaish3301/BuildYourOwnGit.

Comments (0)

Verify your email to join the conversation

By verifying and posting, you agree to the Privacy Policy, Terms, and Disclaimer.

No comments yet. Be the first!