Effective Git

Published: 2021-11-26
Tagged: git learning software work

Git isn't easy. The first, nerve-wracking months of my first job were made even more intense by having to "commit" and "push" and fix "merge conflicts", and... ugh! Now, almost nine years later, I still occasionally find myself in a sticky situation, but it's more of a fun exercise to break the rhythm of the day than anything else.

I wrote this guide because I see many developers, both baby-faced new grads as well as (fewer) grizzled grey-beards struggling like I once did. That seems like such horrible waste because each of us has a limited amount of quality focus time each day. But nobody has the time to learn all of git -- all those subcommands and flags would take weeks to study and practice. However, power laws apply (almost) everywhere, so there has to be a small slice of git, say 20%, that produces 80% of the results. What is it?

Based on my experience, that 20% is:

  1. Understanding what git really is. Because understanding what all those commits and branches are doing makes it clear what every git command will do, allowing you to get what you want faster or avoid costly mistakes.
  2. Learning the dozen plus two commands that are the most useful. Some of them you will run hundreds of times a months, so small efficiencies in their use add up quickly. Others you will use a few times a year, but each one can save you hours of work.

Let's dive in!

What is git, really?

Git records snapshots of changes performed on a directory of files. These snapshots are called commits. Each one references at least one previous commit, creating a chain of changes from oldest to newest. Multiple commits can reference a single one, which allows us to branch off small sub-chains and merge them later. This whole structure ends up looking like a tree.

To help navigate this tree, git supports creating labels that point at specific commits. The first type are called branches. Branches always point at the tip of a chain of commits. When you initialize a new repository and create the first commit, git creates a "master" branch that points to this commit. And when you create another commit, git updates the current branch to point at it.

How does git know what the current branch is? After all, multiple branches can point at the same commit. The answer is that there's a special label called HEAD that always points at the current branch. So if you create a new branch, git will update HEAD to point to it. And when you create a new commit now, the new branch and HEAD will be updated to point to it.

The second type are called tags. Tags are like branches except that they are not updated when new commits are created. In other words, they are static, always pointing at the same commit, which makes them great of labeling versions of software.

Git-the-CLI is used to manipulate the tree and its labels. When you check out a branch, git makes the filesystem directory look like the latest commit on that branch. And when you commit changes, git writes information about the state of the filesystem directory to the tree (updating the branch, too).

That's git in a nutshell.

It also does a few other helpful things like downloading commits from remote repositories or merging conflicting files, but at the end of the day, you're just manipulating a tree.

Now, let's go one level of abstraction deeper: you're actually manipulating three trees: the repository (tree of commits), the filesystem directory (working directory), and the staging area tree. Understanding how these relate to each other will help you understand what git commands do.

The working directory is the collection of all tracked and untracked files in a directory managed by git (ie. a directory with a .git subdirectory). Tracked files are those that git "knows about", ones that have been staged or committed earlier. Untracked files are new files.

The staging area is a temporary tree that sits between the working directory and the repository. It's useful to think of it as a sort of scratchpad, a place for incrementally adding changes until you have exactly the ones you want to create a new commit. I suspect that without this feature, we would all be committing or omitting important files.

Let's stop here.

At this point, you should be able to answer these two questions some of the time:

  1. Will the command I am about to run read or write to the tree?
  2. Which tree will it read from or write to?

With practice, you will get the right answer more and more often and you will find yourself getting into fewer complicated situations. And then, when you, these questions will help guide you out without much stress.

Alright, let's move onto the most useful commands, beginning with all those that read from a tree.

Reading from the Tree

  1. git show
    • git show (.|commit|branch-name|tag)
      • Print the contents (changes) of the commit indicated by the argument. Note that . refers to the latest commit on the current branch, which is equivalent to git show HEAD. Useful for inspecting changes and tying them to a person/time/branch.
    • git show (.commit|branch-name|tag):file-name
      • Print the full contents of file-name at a specific commit.
  2. git log
    • git log --oneline -n5
      • Print up to 5 commits leading up to the current commit, using a compact format of short-hash (optional branch/tag) commit message. Useful for quickly orienting yourself after a pull or checking out a new branch.
    • git log (file-name)
      • Print the commit history of a specific file. Great for finding out who touched it last or seeing how a piece of code evolved over time.
    • git log --grep (search-string)
      • Print only the commits that contain search-string in the commit message. Another useful investigation tool for understanding changes.
      • A related flag is the -S flag that will print all commits that contain search-string in contents of the entire commit.
    • git log (branch1)..(branch2)
      • Print the set of commits in branch2 minus the set of commits in branch1. Useful for understanding the difference between branches, especially master, eg. git log master..HEAD will print all commits on current branch that are not merged into master. git log origin/master..master will print all commits on local master that are not present on the remote master branch.
    • git log (branch1)...(branch2) --left-rigt
      • Print the set of commits unique to each branch minus the commits they share. This is useful for comparing two feature branches and the --left-right flag makes the output more explicit about which branch each change belongs to. Rarely used, but useful when you need it.
  3. git diff
    • git diff
      • Print the differences in tracked files between the working directory and staging area.
    • git diff --cached
      • Print the differences between the files in the staging area and the last commit. Supremely useful for checking what changes are going to be included when you run git commit. I use it to review my work before I make any commit and it has saved me from countless little errors like typos, missing files, revealing secrets, etc.
    • git diff (branch1) (branch2) OR `git diff (branch1)..(branch2)
      • Print the differences between the tips of branch1 and branch2. Useful for seeing the exact differences between a feature branch and master.
    • git diff (branch1)...(branch2)
      • Print the differences between the tip of branch2 and the closest common ancestor of both branch1 and branch2. Useful for comparing feature branches.
    • git blame
      • git blame (file-name)
        • Print the contents of file-name, annotating each line with short-hash author timestamp line-contents. When combined with git show and git log, serves as a powerful tool for investigating git history.
      • git blame (file-name) -L10,20
        • Print the contents of file-name, annotating each line as described above, but only print lines 10 through 20, making it handy for large files.
    • git tag
      • Creating tags is covered in the "Writing to the Tree" section below.
      • git tag -l
        • List all tags in the repository.
      • git tag -l (pattern)
        • Print all tags that match pattern, where pattern can be any valid shell wildcard pattern, eg. git tag -l 'v1.*' will print v1.0, v1.1, v1.1.0, etc.
    • git reflog
      • Like git log, but prints the history of pointers like HEAD, branches, tags, etc. In other words, this logs every time git checkout is run (because the HEAD pointer is moved) and every time you make a new commit (because the branch pointer is moved).
      • Useful for debugging git rebase or git reset problems because if HEAD ever pointed at a commit, you can find the hash in the reflog.
    • git checkout
      • git checkout -b (new-branch-name)
        • Create a new branch and move HEAD to point to it.
      • git checkout (branch-name)
        • Move HEAD to the tip of branch-name and make the working directory look like the latest commit on that branch. In case files from the current branch and branch-name would overlap, it will print a warning and abort the checkout, making it a safe command.
      • git checkout (branch-name) (file-name)
        • Without moving the HEAD, restore file-name from the tip of branch-name to the working directory.

Writing to the Tree

  1. git add
    • git add (file-name)
      • Moves file-name to the staging area, preparing it to be written to the tree. Accepts globs, allowing it to add multiple files at once.
    • git add -i
      • -i enables interactive mode. This can be a very powerful tool because, if you commit the shortcuts to muscle memory, it can serve to very quickly add/update/remove/patch/etc. any number of files to the staging area.
  2. git commit
    • Adding this for completeness sake: the command takes the files in the staging area and creates a new commit on the current branch, followed by updating both HEAD and the current branch to point to it.
  3. git reset
    • This command can read and write to both the working directory and the tree, so while it doesn't strictly fit in this section, it's the best place to describe it here.
    • I've found git reset to be useful for undoing additive commands like git commit and git add. Rarely, but it's also come in handy to reset the repository to clean known state after a messed up rebase or merge operation.
    • On top of that, I feel like this command really allowed me to understand that tree-like nature of git. Just playing around with it a few times showed me how pointers/branches move and how it affects the output of git status.
    • git reset --soft (commit|branch-name|tag-name)
      • Move the branch that HEAD is pointing to to the specified commit|brach-name|tag-name. This does not modify the staging area, which looks like the commit just before HEAD was moved. So if you run git status after this command, the staging area will look like the commit you just moved from. In other words, you just undid a git commit operation.
    • git reset --mixed (commit|branch-name|tag-name)
      • This is the default git reset operation.
      • Does everything --soft does plus populates the staging area with the contents of commit|branch-name|tag-name. If you ran git status, you would only see changes in the working directory -- the working directory would look like the commit you moved from. And continuing with the example from above, if you ran git reset HEAD~2, you would have undone both git commit and git add operations.
    • git reset --hard (commit|branch-name|tag-name)
      • This is unsafe to run, meaning it will overwrite files in your working directory so you could potentially lose work. Tread carefully.
      • Does everything --mixed does plus populates the working directory with the contents of commit|branch-name|tag-name. If you ran git status now, it would print nothing to commit, working tree clean.
  4. git rebase
    • Take a string of commits and apply them on top of a branch. Keep in mind though that if the branch exists on a remote repository, you will have to force-push it up after rebasing, creating confusion for everyone else. So use this only on branches nobody else is using for best results.
    • I've found it useful in two cases, one pretty much daily, and the other occasionally. Let's start with the frequent one.
    • git rebase -i HEAD~2
      • HEAD~2 tells git to rebase off the "2nd parent" of the current commit. Increase that number and you'll go back farther back. The -i flag opens an interactive prompt that allows you to specific what to do with each commit in the selection: edit the message, drop or squash the commit, stop for making changes, etc. This is super useful for when you're working on your own branch and preparing to make a pull request by giving you an opportunity to combine/split commits, write nice messages, etc.
    • git rebase (master|other-branch)
      • This will take all the commits from the current branch and apply them on top of master. The current branch will remain a separate branch from master though, so you can continue working on it. This is great for catching up with changes on master or merging feature branches together.
      • Special note: I rarely, if ever use git merge. There's nothing wrong with it, but the only time I think git merge is useful is when you merge branches into master. And since that's handled by the software running the remote repository like GitHub, GitLab, Bitbucket, etc. then I don't do it on my local machine.
  5. git cherry-pick
    • Take one or more commits and apply them to the current branch. The difference between cherry-pick and rebase is that cherry-pick creates new commits while rebase moves the commits to the current branch.
    • git cherry-pick (commit-hash)
      • Take commit-hash and apply the changes to the current branch. Very handy for moving small changes, like bug-fixes, from one branch to another.
  6. git tag
    • Tags come in two flavors: lightweight and annotated. Lightweight tags are simple labels pointing at a specific commits. Annotated tags do that plus include creation timestamps, tagger name and email, a message, and optionally a GPG signature. The former are useful for local, temporary tags while the latter for permanent tags made for others--like indicating release versions.
      • git tag (tag-name) (commit-hash)
        • Create a lightweight tag tag-name attached to commit-hash.
      • git tag -a (tag-name) (commit-hash)
        • Create an annotated tag tag-name attached to commit-hash. It will open a prompt to enter your message. Use -u or -s flag if you want to add a gpg signature. Don't forget to add --tags to git push to upload the newly created tag to the remote!
  7. git revert
    • git revert (commit-hash)
      • Create a commit that undoes the changes of commit-hash. Useful for rolling back changes that have already been merged to remote branches because it won't break the normal git pull workflow for others.

The One that Doesn't Fit Above

  1. git stash
    • Git stash is useful for saving changes when changing branches. It's actually rare when I get to start working on a branch and continue until I finish without have to switch to other work. Also handy for doing some experimental changes, then saving them for the future.
    • git stash
      • Create a snapshot of the working directory and staging area and save it, then restore the working directory to the commit that HEAD is pointing to. Add -u to include untracked files as well. Use it when you need to quickly clear up any work in progress without losing it.
    • git stash pop
      • Take most recently saved snapshot from the stash and apply it to the branch you're currently on. The reverse of git stash
    • git stash list
      • View all the saved changes in the stash. Notice how they are addressed in the stash, eg. stash@{2}. This allows you to retrieve changes from further back in time, allowing you to stash multiple different sets of changes.

Comments

There aren't any comments here.

Add new comment