Good Python Code to Read

Published: 2017-03-14
Tagged: python architecture learning software guide

Note: This is a living document - I update it anytime I discover a good piece of Python code. Newest entries are first.

Updated: March, 14th

The Why

I consider reading code as one of the ways to improve your programming skills, especially in the area of architecture and code quality. You get to see the result of hundreds or thousands of hours of someone's work - the decisions, the rewrites, the merges, and the bugfixes. If the code is under source control, you get to explore it in the 4th dimension, too, although I only rarely do that.

This gives you insight into how different solutions worked, what cost did they incur, what other things had been tried. You also get a peek at the meta-level, too: what test strategies were chosen and how did they work out, how are contributors organized, how is knowledge shared, how issues are triaged. You know, stuff that you can re-use at work.

The List

asyncssh

Link: github

Description: AsyncSSH is a Python package which provides an asynchronous client and server implementation of the SSHv2 protocol on top of the Python asyncio framework

Notes:

I started looking for an asynchronous alternative to Paramiko and happened upon this library by pure luck. I was pleasantly surprised to see just how clean and python the code is - PEP8 compliant, short functions, lots of docstrings, clear variable names, plenty of tests, and great overall structure. If you delve a little deeper you'll find that there's tons of documentation as well. What's more astounding is that it's the labor or just one man - Ron Frederick. Ron - if you ever happen to read this - thanks a lot for the amazing work!

Python: logging module

Link: github

Description: It's the standard library logging module

Notes:

I've always found the logging module a bit confusing to use. Should I use .basicConfig() or should I .getLogger() and then configure it step by step? What exactly does a logger's name do? What else, apart from outputting to a file or stdout, can logging work with? Reading through the module answered all of those questions and quite a few more. The code is well structured and easy to read through with a good amount of documentation thrown in.

I also think this is a great read because it gets you used to reading standard library code - you know, the code that's used on like 90% of other projects you'll get in contact with.

sqlalchemy-migrate

Link: github

Description: Database schema migration for SQLAlchemy

Notes:

Most people that I know that work with Flask use Alembic to manage DB migrations and I bumped into this library on an older project that I've worked on recently. Due to my task, I got the chance to get really deep into this codebase, which isn't that big, and learn how the hole thing works. It's not everyday that you get to take apart a piece of code responsible for managing schema migrations, which made it an incredible experience. The code is fairly clean, but it features marks of a mature project that's not finished yet - there are TODOs, commented out blocks of code, and other small warts related to growing pains.

SQLAlchemy

Link: github

Description: The Python SQL Toolkit and Object Relational Mapper

Notes:

This is a big codebase and I only explored the parts related to the layer that talks directly to the database (dialects). The short time it took me to dive in deep and examine how exactly SQLAlchemy talks with databases is a testament to how coherent and clear things are. What caught my attention here is the clear project structure and the well devised names for everything.

Django

Link: github

Description: The Web framework for perfectionists with deadlines.

Notes:

Django is a fairly piece of software at around 200k LOC and while you can get started with it quickly using the official tutorial, you'll constantly be bumping into things the tutorial never mentioned. The documentation is superb, but more and more often I've just looked at the source to figure something out. The code is swimming in docstrings, the functions are usually short, and the classes are on point (SRP. I'm not a die-hard fan of OO, but this is one of the better examples of clean OO design that I've seen. I encourage anyone working with Django, whether they're an expert or beginner, to dig in and explore - the best way to start is look for some django import that you use often (eg. from django.core.urlresolvers import reverse) and start exploring from there.

Tornado

Link: github

Description: Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.

Notes:

This is the codebase that introduced me to reading code - I was curious how the hell do you do asynchronous programming in Python from scratch. There's about 25k LOC here, a good amount of documentation and doctstrings, and the general structure of the code makes it easy to navigate. It's a mature project so you'll also find comments explaining the reasoning behind certain details or comments pointing at some bugfix from many years ago. One thing that I noticed is that some functions have a tendency to drag on, but this is compensated by the use of very clear variable names. I can recommend reading through this project to anyone that wants to learn how single-threaded asynchronous programming works - understanding the concepts behind IO loops, queues, epoll/select, deferreds, and such make it much easier to jump into other similar pieces of code such as Twisted or Python's asyncio module.

Comments

There aren't any comments here.

Add new comment