Software's Cornucopia

Published: 2023-08-06
Tagged: essay programming productivity

In 2011, Marc Anderseen wrote in the Wall Street Journal about how software is eating the world. But the software he was describing was clunky and frustrating to use. That limited its reach to those of us equipped with an abundance of patience and curiosity. Since then, however, those limits have been pushed way, way out, and it's hard to overstate how completely software permeates and influences our lives today.

It's even harder to figure out where software is going though. Most of us, when we consider this question, tend to think about developments like crypto, virtual reality, or artificial intelligence. But by narrowing our focus this way, we are sure to miss broader, more fundamental changes--like how software is getting increasingly cheaper and faster to make. Failing to understand this puts us at risk of misapprehending not just the exciting developments listed above, but also the effects that omnipresent software has on us.

What then is driving changes to software production?

The Hard Side of Software

For one, developers spend less time reinventing the wheel because there is more open source code available for solving common problems. Instead of writing their own basic web serving or number crunching functions, they can choose from a number of ready, publicly available packages. That leaves them more time to work on problems directly related to their job. Moreover, these packages are the result of engineer-decades of effort and are used in tens of thousands of products, ensuring a level of quality that's difficult to match with homegrown code.

Tools for working with code have gotten better too. Integrated Development Environments (IDEs)--which allow developers to write, test, build, and debug their code all in one place--used to require significant ramp-up time. But their modern versions feature streamlined graphical interfaces that ensure engineers can become productive immediately. At the same time, linters and code formatters have become commonplace. The former analyze developers' output for well known mistakes, both stylistic and technical, and prompt them to fix whatever is wrong. The latter make code look uniform. That may sound trivial, but engineers used to waste precious time on moving brackets and semi-colons around. Worse, they used to fight over different formatting styles. Now these tools can be configured once and a whole class of friction disappears.

In a similar fashion, automated testing (code that tests other code--a powerful defense against bugs) became easier to set up, allowing it to become a norm. Before, each individual developer would run such tests whenever they thought it was a good idea. Some would do it often, but most only did it occasionally and a handful never. That made quality, even within the same company, vary a lot. So by decreasing the effort to automate this, a good part of that variance disappeared.

Actually running code used to be a surprisingly big problem. Companies, even single teams, used to run different versions of basic software on their servers, which meant deploying an application would be hit or miss. And since each server would differ in subtle ways, it would often cost engineers significant effort to get things running smoothly. But today this problem is largely solved by "containers", a novel way of bundling software and its dependencies that allows it to just work almost anywhere.

Managing those servers has changed dramatically as well. What used to be a time-intensive and manual process that produced bespoke machines has been abstracted away by configuration management tools. These tools automate almost all manual work and produce uniformly configured servers, greatly enhancing their reliability and security. Moreover, the same tools constantly check the servers' state and undo changes accidentally made by engineers.

Despite these improvements, many software companies choose to forgo the hassle of physical servers altogether in favor of cloud computing. Cloud services, which essentially outsource running and maintaining servers to a provider like Amazon or Google, have matured to the point where the developer time they save is often worth the premium they cost. The economies of scale involved also mean that cloud servers are of higher quality than what many a small company would be capable of building and running.

The Soft Side of Software

Behind all these changes runs a more subtle trend: engineers have transformed from simple specialist-technicians and into professionals whose broad focus is quality.

It used to be that the dominant metaphor in the business of software was construction. It's evidenced by nomenclature like "building", "architecture", "tooling"; but even more so by the process of making software. The way it worked was that executives would first find an unmet need in the market. They would then talk with software architects, usually grizzled industry veterans, about how software could fill that need and create customers. Once a solution was arrived at, the architects would then create a specification for the program and hand it off to programmers for implementing in code. Afterwards, when the program was ready, QA would test it and and report bugs to the programmers to fix before finally OK'ing it for release.

It's a top-down, one way approach to building things, popularly called "waterfall". It seems to work well with physical things, like buildings and bridges. But software is different. It's fluid and malleable and involves a lot of low-level individual judgement. In that, it resembles art to some extent. And looked at from a group/team perspective, there's also something to it that's like gardening: not knowing for sure how some part of the work will turn out, and having to leave your options open, and being ready to improvise.

Because of that, waterfall software projects rarely went smoothly or even succeeded at all. The specification almost always left out a lot of detail, which each individual programmer had to supply themselves. Now multiply that by the number of programmers on a team and the number of teams assigned to the project and what you got was a pile incoherent code. When the deadline came around to combine all that code into the final product, it almost never worked. Programmers had to spend nights and weekends to straighten things out. But later, when a program that technically "ran" was handed off for QA, it turned out that there were even more subtle problems that made the software unusable for end-users. That meant more nights and weekends of rework--sometimes even months or years of it--and mounting costs.

This state of things changed when programmers--not executives or researchers--began to look for ways to improve how software is produced.

Take, for example, Continuous Integration/Continuous Delivery (CI/CD). CI/CD can be boiled down to combining and deploying code in small chunks--not even features, but rather the tiny pieces produced during a single workday. On the technical side, doing things this way ensures that each piece works with the larger codebase. And on the fuzzier, human side, it ensures that engineers are more aware of their colleagues' code, helping to keep the mental model of how it all fits together more accurate. Another benefit of this approach is that if a broken piece of code gets deployed to users, it's likely that it was introduced just recently. Finding and undoing the bad change is easy then, especially compared to waterfall projects where the change could have been made weeks or months earlier.

Another example is the emergence of Site Reliability Engineering (SRE) as a specialization. SRE aims to address problems of software's reliability by automating risky manual tasks. Automation, after all, is less likely to make mistakes, especially when dealing with complicated systems made up of numerous applications. But for it to do the right thing, it needs to "understand" the state of the system. So a large part of SRE work is improving how software is monitored with an eye toward making it understandable to the humans building said automation. Finally, because it's inevitable that humans will make mistakes even in that, SREs are often involved in improving the processes of the organization itself. Probably the most well known example of this is the blameless post-mortem.

The way a blameless post-mortem works is this: when something inevitably breaks, engineers investigate the chain of events that lead up to it in order to prevent it, and perhaps even the whole category that "it" belongs to, from happening again. Almost always there's a person involved in that chain, but instead of punishing them for doing the wrong thing, SRE asks why, assuming they had the best intentions, did the person do something wrong? And almost always it turns out that a tool was confusing or guide was outdated--all simple problems that can be objectively improved. Because the ultimate goal is to prevent the problem, not to dole out blame.

Both CI/CD and SRE were inspired by advances in manufacturing and introduced to the software industry in a bottom up fashion. The people behind it simply decided to take on a larger responsibility than merely implementing somebody else's designs. These mavericks triggered the unraveling of the waterfall way of business by offering something more effective.

Along the way, programmers stopped being called programmers and became software developers or engineers. Their career ladder forked too: management wasn't the only way up, now they could also grow into staff and principle engineers. They also had won a seat at the table, finally bringing badly needed expertise to executives. And "architect", as a role, has almost completely disappeared today.

I would be remiss if I didn't remark on another huge shift in software engineering culture: there are a lot more engineers and they're more diverse.

Only a small percentage of all software developers work on tooling or process improvements. It works well because these improvements diffuse rather quickly through the culture. However, because the overall population of engineers has become larger, the group of tool- and process-makers has grown larger as well. But the pool hasn't just increased in size, it has also become more diverse as more of the world has come online--diverse not in a way that looks good on paper according to some bureaucratic requirements, but in a healthy, diversity-of-experiences-and-viewpoints kind of way that fuels experimentation and keeps pushing the boundaries of quality.

It may seem like a paradox that increasing software quality would result in it becoming cheaper and faster to develop. Usually, better things cost more and take more time to make. But in this case, the opposite is true: better software, software that you can trust not to break (at least easily), means less time spent on fixing bugs and more time writing code to address real business needs. More software produced per unit of time means the effective cost of developing it goes down.

This isn't a wholly new phenomenon. Toyota did something similar with cars. Early on, it created a system for continually improving quality. Better cars and better manufacturing processes meant less time spent on fixing defects. That increased production volume, and more volume meant lower prices. It was called the Toyota Production System--something other car makers have been trying to copy it since the 70's.

The Price of Quality Culture

Despite these advances, there is still considerable friction--and room for improvement--in software production.

The dark side of culture as the driving force behind quality is that it entails an enormous amount of variance. Company to company, team to team, even engineer to engineer--efficacy differs a lot. Just how much?

In 2020, the US Government hired Deloitte, a British consultancy employing almost half a million people, to produce a vaccine data system. Chief among its functions was to allow people to make vaccination appointments quickly and easily with clinics that had the capacity. The $44 million dollar project ended up a complete mess. Many potential patients had trouble making appointments. Doctors often couldn't log in or even load the system's webpage. As a result, 41 states chose not to use the system, even though the federal government was offering it to them for free.

Now consider WhatsApp. By the time it was bought by Facebook, it had taken on about $58 million dollars in funding and attracted 450 million users that exchanged 50 billion messages each day. The impressive phone app and infrastructure to process all those messages was built and maintained by just 35 engineers.

We don't have any insight into how Deloitte went about building their vaccine data system. But it's clear from the example of WhatsApp that that sort of money should have achieved, if not something that scales to hundreds of millions of users, at least something that functions.

The same variance is to blame for the sorry general state of computer security. Because while things here have improved immensely over the last decade or so, software has also become more deeply enmeshed in critical parts of our lives, making it a vastly more attractive target for criminals. The criminals themselves evolved from loosely grouped individuals with sociopathic tendencies into a global, well organized crime machine--with its own institutions, hierarchy, even infrastructure.

This is plain to see by the almost weekly reports about another data leak or ransomware attack. To these we must add a new vector: supply chain attacks. Because so much software is now made up of reusable open source components, it's possible to insert malicious code, like a backdoor, into any of them. And since these components can add up to tens of millions of lines of code, it's infeasible for an individual or even a team of developers to notice when something is wrong. We're still figuring this one out.

Another source of friction comes from the most popular method of organizing software work today called Agile. Agile set itself against the waterfall way by emphasizing individuals over processes, customers over contracts, adaptability over planning, and working software over detailed specification. But if you ask a random sample of engineers about what they hate most about their work, they will probably say it's Agile.

Some will say that emphasizing individuals means engineers have to spend more time meeting and syncing up with their peers. Others will say that it involves too much adaptability, with plans changing week to week. Still others will complain about their role being too broad--that it's impossible for one person to write code, test it, carry a pager for it, and on top of that to talk with customers, plan projects, interview potential teammates, etc. Worse, we now have organizations offering training and certification in Agile, which turns the whole thing from a set of loose principles into a suffocating framework.

I'm reminded of the 1999 hit movie Office Space, which depicts the transformation of one Peter Gibbons. In the beginning, he is a resigned nerdy programmer working from a isolating cubicle for a terrible boss. But by the end, he's found fulfillment working in a new job, one that involves other people and working outside--demolition. Today, if you spend any time around engineers, you'll sooner or later hear yearnings for switching careers to something like construction or woodworking.

Abundance Ahead

Despite the issues outlined above, we have no reason to doubt that software will keep getting cheaper and faster to produce. There's enough momentum and appealing (and lucrative!) problems to ensure our techniques and tools will continue improving.

The big unknown that's appeared in just the last few months is AI, specifically Large Language Models or LLMs. Right now, there's so much hype around the topic that it's difficult to extrapolate with any accuracy what effects it will have on the industry. In the worst case scenario, LLMs should be able to take care of churning out boring boiler plate code. That would save engineers, as a whole, at least a single-digit percentage of their work time.

However, there are reasons to be cautiously optimistic. LLMs have the potential to drive a shift similar in scale to switching from programming in assembly languages--the arcane native instructions understood by CPUs--to high-level programming languages. That latter smooth over much of the complicated details of computer hardware, making it possible for today's engineers to do in minute what yesteryear's programmers needed hours to achieve.

We'll have to wait and see.

What we can be sure of, however, is that we'll see more and more software. A torrent, really. And judging by the changes that increasingly abundant energy has brought on us, like going from wood to coal or from coal to oil, we should expect equally mind-blowing changes from abundant software. Problems that are too trivial or too expensive to apply software to today will be ripe for tomorrow's engineers.

Marc Anderseen wasn't wrong in 2011. But perhaps even he couldn't foresee just how ravenous software will be.

Comments

There aren't any comments here.

Add new comment