Pierce the Abstraction Wall

This post is part of my sporadic Notes to a Young Software Engineer series.

When I started web development, I used ActiveRecord, Ruby on Rails’s object relational mapper (ORM).

Like all ORMs, ActiveRecord’s job is to hide the complexity of the relational database — the SQL language, the custom CLI — making it easy to query the database in a developer’s native vernacular.

I used the opportunity to express my nonchalant ignorance — at times, willful disdain — for SQL. Hilarity ensued. Schemas were poorly designed. Trivial queries became laughingly inefficient. With just a thousand concurrent users, the tech slowed to a crawl.

A senior engineer at Twitter took pity on me and walked me through the basic flaws in my calls (it was a few pegs down from his day job helping scale Twitter to billions of users). I eventually took all of two days to immerse myself in SQL and the Postgres manual, marveling at how quickly the concepts were to learn and the pain it would have saved.

On the teams I run in the years since, I continually see issues with this overly broad respect for external systems. To be an effective developer, regularly aim to pierce the abstraction wall, especially when you identify a soft interface.

The power and perils of abstraction

The astounding modern productivity in software engineering is due to abstractions that hide immense complexity.

Machine code, becomes assembly, becomes a static language, becomes a dynamic language. In machine learning, brilliant researchers package algorithms into easy calls that can be made by any early engineer. In Node.js, a web app depends on hundreds of libraries. Stripe, the payments unicorn, built an astonishingly successful business on the observation that adding payments should be trivial for developers.

As such, my criticism can seem misplaced. Who actually reads assembly today? Shouldn’t we simply ignore most abstractions we depend on?

You can see this same debate play out with Left Pad, where delisting a single, trivial module broke the most popular packages in Node.js. Millions depended on this trivial library, without reviewing the code that underlay their work. In his searing critique about how developers blindly added this dependency, David Haney plaintively asks, “Have we forgotten how to program?”

The issue comes down to the difference between soft and hard interfaces.

Soft and hard interfaces

All software interfaces are not created equal. Issues surface when we mistake soft interfaces for hard interfaces.

With hard interfaces, very few software engineers ever need to understand the underlying system — and few modern engineers venture too deep. A classic example might be assembly or even a static language like C ¹. Hard interfaces are explicit borders that few journey to the other side of.

By comparison, with soft interfaces, the abstracted layer may be immature, inefficient, or need to be customized for each use case. Depending on the latest NPM libraries isn’t the same as depending on a battle tested database or compiler, even though both provide abstraction.

Over time, we can “close” some abstraction walls, where soft interfaces are replaced by hard interfaces. At that point, a sizable majority of software developers — say 80% — never venture beyond the interface they’re used to. Assembly engineers disappear, and we’re left in a world where most of us code in a higher level abstraction, with much gained (productivity) at a cost that we’re willing to bear (performance). ²

There are a few obvious places where we create abstraction walls:

A different language/domain specific language: SQL vs Ruby ORM, Javascript vs C
Someone else’s code: A coworker’s code
Complex algorithms: A complex image classification algorithm

The Benefits

Piercing the abstraction wall provides two key benefits.

First, great learning comes from understanding the systems you depend on. Each system you depend on is an opportunity to learn a new world that provides benefits for your work.

For early engineers, I consistently recommend reading the libraries they depend on, starting with the simpler ones. It’s an easy way to see relevant, often high quality code, while increasing your productivity for future projects.

Second, when you understand the abstraction you depend on, you build more optimized systems. You get to optimize globally within the broader system, rather than optimizing locally within your own code alone.

In the case of ActiveRecord, if we stay in Ruby, we can create all sorts of contortions without getting closer to an optimized query. With SQL, a whole host of optimization are suddenly available.

As such, when approaching a new interface, it helps to ask a few questions that determine whether it is a hard or soft interface:

Eyeballs: How battle tested is this code? (developer eyeballs, length of existence)
Value: Will learning this code be valuable to me in my current task or career?
Effort: How much work will it take to understand the basics of this code? (days, weeks, years)

This quick analysis can help indicate when it’s worth digging deeper.

Eventually, you may reach the point where it’s impossible to feel comfortable adding a new dependency without understanding it totally which has its own issues. But until then, piercing the abstraction wall has huge advantages.

This post is part of my sporadic Notes to a Young Software Engineer series.

For some developer friends, I know this will be a horrifying statement to think that there are developers who don’t learn a static language. ↩
This might also be a parable of the death of American manufacturing in the 20th century. ↩