Howto: Reading Git Diffs and Staging Hunks

Published: 2017-01-14
Tagged: git software learning linux

Reading Diffs

I use git diff and git diff --cached to ensure that I'm only committing things that I want to commit, to help resolve git conflicts, or to compare how a file looked at different points in time. I'm a bit ashamed to say that I never sat down to understand all the information that git diffgives you, so I thought I'd write about it :).

Let's take a look at the example diff that I'll be using to explain diffs in general:

diff --git a/src/util/UpnpPunch.py b/src/util/UpnpPunch.py
index 1e47d6a..8d82ae4 100644
--- a/src/util/UpnpPunch.py
+++ b/src/util/UpnpPunch.py
@@ -82,7 +82,7 @@ def _retrieve_igd_profile(url):
     Retrieve the device's UPnP profile.
     """
     try:
-        return urllib2.urlopen(url.geturl(), timeout=5).read()
+        return urllib2.urlopen(url.geturl(), timeout=5).read().decode('utf-8')
     except socket.error:
         raise IGDError('IGD profile query timed out')

@@ -100,7 +100,14 @@ def _parse_igd_profile(profile_xml):
     WANIPConnection or WANPPPConnection and return
     the 'controlURL' and the service xml schema.
     """
-    dom = parseString(profile_xml)
+    try:
+        dom = parseString(profile_xml)
+    except Exception as e:
+        with open('upnp_error.txt', 'wb') as f:
+            f.write(e)
+            f.write('====')
+            f.write(profile_xml)
+

     service_types = dom.getElementsByTagName('serviceType')
     for service in service_types:
@@ -336,11 +343,3 @@ if name == "main":
     logging.getLogger().setLevel(logging.DEBUG)
     import time

-    s = time.time()
-    print "Opening port..."
-    print ask_to_open_port(15443, "ZeroNet", retries=3, protos=["TCP"])
-    print "Done in", time.time()-s
-
-    print "Closing port..."
-    print ask_to_close_port(15443, "ZeroNet", retries=3, protos=["TCP"])
-    print "Done in", time.time()-s

First we'll look at the diff's headers and then at each of the hunks (Hunks are the small changes made to a file. A commit is made up of hunks).

diff --git a/src/util/UpnpPunch.py b/src/util/UpnpPunch.py
index 1e47d6a..8d82ae4 100644

The first line is the diff header that tells us the name of the file being diffed, the format of the diff (--git), the git hashes (1e476da..8d82ae4) of the file before and after the changes, and finally the file's permissions. You can use the git hashes along with git show to look at the file before and after the changes like this: git show 1e476da and git show 8d82ae4. If we were moving a file or changing its name, the first line would reflect this change. Onto the next part:

--- a/src/util/UpnpPunch.py
+++ b/src/util/UpnpPunch.py

These two lines tell us the file's name and path again and tie it with the - and + symbols. These can be looked at as source (---) and destination (+++). If a file was just created then this would look like:

--- /dev/null
+++ b/src/util/UpnpPunch.py

And if the file was just deleted, then the destination would be /dev/null like so:

--- a/src/util/UpnpPunch.py
+++ /dev/null

There are a few other things that can come up in the header, like three-way-merges, but I'll skip them as they happen more rarely. Let's move on to the hunks!

The first hunk is a simple one-for-one line change and looks like this:

@@ -82,7 +82,7 @@ def _retrieve_igd_profile(url):
     Retrieve the device's UPnP profile.
     """
     try:
-        return urllib2.urlopen(url.geturl(), timeout=5).read()
+        return urllib2.urlopen(url.geturl(), timeout=5).read().decode('utf-8')
     except socket.error:
         raise IGDError('IGD profile query timed out')

The -82,7 refers to the range of the hunk in the source file (hint: dashes refer to the source file, pluses to the destination file). Basically, 7 lines of the source file are shown, starting at line 82. The +82,7 works in the same way, except this time it pertains to the destination file. Since both the source and destination differ by one line, these numbers are the same. This may be confusing now, but the next two hunks illustrate how this works much better.

The def _retrieve_igd_profile(url): part tells git the name of the function that these changes are from. I may be wrong on this, but I believe that this only works for C-family languages. Finally, the remainder of the hunk shows the file's source with - prefixes highlighting the source file and + prefixes highlighting the destination file.

The next hunk changes one line in the source file into many lines in the destination file:

@@ -100,7 +100,14 @@ def _parse_igd_profile(profile_xml):
     WANIPConnection or WANPPPConnection and return
     the 'controlURL' and the service xml schema.
     """
-    dom = parseString(profile_xml)
+    try:
+        dom = parseString(profile_xml)
+    except Exception as e:
+        with open('upnp_error.txt', 'wb') as f:
+            f.write(e)
+            f.write('====')
+            f.write(profile_xml)
+

 service_types = dom.getElementsByTagName('serviceType')
    for service in service_types:

The source range is -100,7 and the destination range is +100,14. If you take the lines without a prefix and combine them with the lines prefixed with -, you'll get 7. If you do the same, but this time combine them with the lines prefixed with +, you'll get 14. This diff is saying "I'm showing you 7 lines from the source file and 14 lines from the destination file. Oh, and I highlighted the changes for you!." Pretty handy, right?

One last example, this time the hunk only removes lines:

@@ -336,11 +343,3 @@ if name == "main":
     logging.getLogger().setLevel(logging.DEBUG)
     import time

-    s = time.time()
-    print "Opening port..."
-    print ask_to_open_port(15443, "ZeroNet", retries=3, protos=["TCP"])
-    print "Done in", time.time()-s
-
-    print "Closing port..."
-    print ask_to_close_port(15443, "ZeroNet", retries=3, protos=["TCP"])
-    print "Done in", time.time()-s

-336,11 means that we see 11 lines from the source file before we removed them. +343,3 means that we see just 3 lines from the destination file. Since there are no lines prefixed with +, it's easy to tell that the only changes made here involved removing source code. Notice that the start of the destination range changed from 336 to 343. This happened because the hunk higher up in the file extended the file by 7 lines.

Committing Files Hunk by Hunk

Now that we know how to read git's diffs and what hunks are, we can fully make use of the awesome patching functionality of git. This allows us to interactively stage only parts of files. It can be invoked either by git add path/file.src -p or through interactive adding: git add -i, option 5. Running the first command puts us in the interactive mode:

matt@solarsystem:~/projects/ZeroNet$ git add src/util/UpnpPunch.py -p
diff --git a/src/util/UpnpPunch.py b/src/util/UpnpPunch.py
index 1e47d6a..8d82ae4 100644
--- a/src/util/UpnpPunch.py
+++ b/src/util/UpnpPunch.py
@@ -82,7 +82,7 @@ def _retrieve_igd_profile(url):
     Retrieve the device's UPnP profile.
     """
     try:
-        return urllib2.urlopen(url.geturl(), timeout=5).read()
+        return urllib2.urlopen(url.geturl(), timeout=5).read().decode('utf-8')
     except socket.error:
         raise IGDError('IGD profile query timed out')

Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]?

That's a lot of options, but pressing ? shows you what they all mean. The most often used options are: yes, no, ignore for now, split hunk into smaller hunks, edit hunk manually. I usually use git's interactive staging prompt (git add -i), which enables you to interactively stage hunks for every updated file in your working directory.

That's pretty much it. Go out there and read some diffs! :)

Hi, I'm Matt.

This blog is an unordered set of thoughts extracted from the mind of a software developer.