Software Development: Abstraction is Overrated

Abstraction is considered a virtue in software development. However, practice shows that wrong abstractions cause more harm than none at all.

listen Print view
View of a desk with various documents and colorful sticky notes.

(Image: sabthai/Shutterstock.com)

15 min. read
By
  • Golo Roden
Contents

In software development, hardly any principle enjoys as much prestige as abstraction. Those who abstract are considered forward-thinking. Those who recognize and combine commonalities are considered experienced. Those who duplicate code are considered negligent. Whether in code reviews, architectural discussions, or job interviews: abstraction is the benchmark by which good code must be measured. At least, that's how it's often conveyed.

the next big thing – Golo Roden
the next big thing – Golo Roden

Golo Roden is the founder and CTO of the native web GmbH. He works on the design and development of web and cloud applications and APIs, with a focus on event-driven and service-based distributed architectures. His guiding principle is that software development is not an end in itself, but must always follow an underlying technical expertise.

I believed this myself for a long time. I abstracted wherever I could, sought commonalities where there were none, and felt good about it. Only over the years did I learn that the opposite is often closer to the truth: that many abstractions don't make code better, but worse. That they create coupling where independence should exist. And that a wrong abstraction is pricier in the long run than none at all.

When developers talk about abstraction, the acronym DRY sooner or later comes up: “Don't Repeat Yourself”. It is one of the most cited principles in software development, and it is understood almost everywhere as a technical instruction: “Do not duplicate code”. There are even tools that scan codebases for copy-paste patterns and raise alarms when two blocks look too similar.

The problem: This interpretation has little to do with what the authors of the principle meant. DRY comes from the book “The Pragmatic Programmer” by David Thomas and Andrew Hunt. It explicitly states that it is *not* about technical duplication, but about semantic duplication: a business concept should not be implemented in multiple places in the system, as this leads to inconsistencies in business rules. Hunt and Thomas even provide an explicit example of technical duplication in their book and clarify that this is not a violation of DRY!

However, this does not stop the industry from continuing to sell DRY as “don't copy code”. And this is precisely what leads to one of the most common forms of incorrect abstraction: technically similar code is merged, even though it represents completely different things semantically.

To give an example that many will know in a similar form: Imagine an application has a User class. This class is used for persistence, for the HTTP API, and for business logic. Three different contexts, but only one class, because the fields are the same. Who needs three User classes?

The answer becomes apparent after a few months at the latest: For persistence and the API, you need JSON annotations. Suddenly, the business logic class carries annotations that it should be completely indifferent to. Then the persistence format changes, but the annotations cannot be easily adjusted because that would break the API. So, you add more annotations. Then an internal field is added that should not be visible via the API. So, you need logic that hides certain fields. And so, the class gradually grows into an ever-larger junkyard, accumulating more and more special cases and exceptions.

The solution would have been so simple: three separate classes, one per context. Yes, that means having to map between them. Yes, that's a bit more typing. But each class exists for its reason, can be evolved independently of the others, and the code is explicit and understandable. In the language of software architecture: low coupling and high cohesion. Not by chance, these are the two fundamental principles of good architecture. And not by chance does the merged User class violate both.

Many developers resist this approach. The argument is made: three classes for the same concept, that's too much effort and too much redundancy. But the three classes exist for different reasons. They have different lifecycles, different reasons for change, different dependencies. They do not belong together, even if they look similar or (initially) even identical.

The pattern behind it is always the same: technical similarity is confused with semantic belonging. The resulting abstraction creates a coupling between things that have nothing to do with each other. Changes in one context necessitate changes in another, or you have to push the respective context through the abstraction, which further increases complexity. And all this just because someone once said: “This looks almost the same, it can be combined.”

Videos by heise

However, incorrect abstractions are not only created by misunderstood principles. They are also delivered by frameworks, pre-packaged and advertised as a feature. The promise is: “You don't need to understand the underlying mechanism. We'll take care of it. Focus on your business logic.”

That sounds tempting, and it works. As long as you stay on the beaten path, everything is fine. The documentation describes the happy path, the tutorials guide you through the happy path, the community answers questions about the happy path. The problem begins when you have to deviate from it. And in real projects, you always have to deviate from it sooner or later.

Take React as an example: JSX is an abstraction that allows writing HTML-like syntax within JavaScript. Most React developers use JSX daily, but few can explain what actually happens. How does JSX eventually become JavaScript that the browser understands and can execute? What transformation steps are involved? Why can a render function only return a single root node and not multiple?

The answer to the last question is revealing: In JSX, each element is translated into a function call (essentially: createElement), meaning the render function returns the result of this function call. And since a function in JavaScript can only have one return type, a render function naturally cannot return multiple elements at the top level – even though it initially reads like a valid HTML structure in JSX.

For those who understand what happens under the hood, the restriction is self-evident. For everyone else, it is an arbitrary rule that is memorized without being understood.

As long as everything works, the lack of understanding goes unnoticed. But as soon as you encounter a problem that is not on the happy path, the situation changes. You don't just not understand the issue, you also don't understand the tool you would need to solve it. Instead of working with the framework, you work against it. This is the moment when the abstraction leaks.

The same pattern appears on another level with programming languages or compilers, for example, TypeScript. TypeScript is an abstraction over JavaScript that adds static typing. The promise: more security, better tooling support, fewer runtime errors. And TypeScript delivers on this promise in many cases.

What TypeScript does not deliver is the implicit promise that you no longer need to understand JavaScript. Many developers today start directly with TypeScript without ever having seriously written JavaScript. They learn TypeScript syntax, TypeScript patterns, TypeScript tooling. JavaScript is, for them, a kind of compilation target that they never touch directly.

This works until it doesn't work anymore. Many of the limitations and seemingly strange behaviors of TypeScript only make sense when you understand JavaScript. Why does the type system behave unexpectedly in certain cases? Why are there design decisions that seem illogical at first glance? The answer is almost always the same: Because TypeScript wants to and must be backward compatible with JavaScript, and it simply doesn't work any other way.

Those who know JavaScript understand these decisions. Those who don't know JavaScript face a wall of inexplicable rules. The abstraction hides exactly the knowledge you need when it reaches its limits. And the paradox is: Even most developers who use JavaScript don't know JavaScript well enough. The language has a reputation for being simple, which is deceptive. Beneath the surface lie concepts whose understanding significantly helps in using both JavaScript and TypeScript better.

The latest iteration of the same pattern is artificial intelligence. AI-based coding assistants and agents promise to fundamentally simplify software development. They generate code, complete functions, suggest architectures, write tests. The promise is familiar: “You don't have to worry about the details. The AI will handle it. Focus on the big picture.”

This sounds like the same promise that frameworks have been making for years. And it works in the same way: excellently on the happy path. As long as the requirements are within the scope for which sufficient training material exists, AI delivers impressive results. In seconds, code is generated that compiles, passes tests, and looks correct at first glance.

The problems begin when the requirements become more exotic. When the combination of technologies, constraints, and business rules is so specific that no training material exists for it. Or when the generated code contains subtle errors that you don't recognize because you never learned how the code works under the hood. An off-by-one error in a loop, a race condition in asynchronous code, a wrongly set index in a database query: such errors only become apparent when you actually read and understand the code. Those who don't do this, because the AI supposedly takes care of it, have a issue that they might only notice months later.

Even more concerning is the gradual loss of competence. Those who have written code by hand for years and now use AI have the knowledge to evaluate the results. But those who have never worked without AI or no longer maintain their knowledge lose precisely the ability that would be necessary to recognize the limits of abstraction.

The pattern is always the same. An abstraction promises simplicity. It fulfills this promise as long as everything goes according to plan. And it fails precisely when it was most needed: in situations that deviate from the plan. Joel Spolsky summarized this in his 2002 article “The Law of Leaky Abstractions”: All non-trivial abstractions are leaky to some extent. This applied to frameworks, it applied to programming languages, and it applies to AI.

After four negative examples, it would be easy to conclude that abstraction is fundamentally harmful. That would be wrong. There are indeed meaningful abstractions that work, and have been working for decades.

Perhaps the best example comes from the Unix world: “Everything is a file”. This means: In Unix, files, devices, pipes, and sockets are addressable via the same interface: open, read, write, close. This abstraction is now over fifty years old and still works excellently. It has become a foundation on which entire ecosystems are built.

What makes this abstraction different from the failed examples? Firstly, it is minimal. It doesn't hide too much, but exactly as much as is necessary to provide a common interface. Secondly, it is based on a deep understanding of the problem. The Unix developers did not abstract first and then see what could be done with it. They first understood what they needed, and then found the smallest possible common abstraction. And thirdly: When it leaks (and it does), the leak is understandable because the abstraction is thin enough to see through.

This is precisely what distinguishes a successful abstraction from a failed one. Successful abstractions require understanding and do not make it superfluous. They arise from experience, not from assumptions. And they respect that coupling must remain low and cohesion high.

What can be learned from all this? Besides the article on leaky abstractions, Joel Spolsky wrote a second article that belongs here: “Back to Basics”. In it, he criticizes that too many developers lack the fundamentals (and this was, mind you, already in 2001). That the opinion prevails that garbage collection will take care of it, without understanding what a stack is and what a heap is, when each is used, and what the implications are.

Of course, one doesn't have to be an expert in everything; that's simply not feasible due to the sheer volume of topics. But understanding the fundamentals of one's own tools is not an optional additional qualification, but the prerequisite for being able to use the abstraction over that tool meaningfully. Those who don't know how JavaScript works will fail with TypeScript. Those who don't understand what JSX does under the hood will fail with React. Those who are unable to judge code themselves will fail with AI.

Therefore, my advice is not to forgo abstraction altogether. My advice is to delay abstraction, meaning not to start with an abstraction, but to make everything explicit first. Write every class, every function, every interface in such a way that it is understandable on its own. Code is written once, but read many times. Readability, traceability, and understandability are more important than three fewer lines of typing.

Only when you know exactly how the requirements behave, which parts actually change together and which only coincidentally look similar, then and only then can you think about abstraction. And even then, the question is worth asking: Do I really need the abstraction? Or am I just making the code shorter, not better? Does the abstraction lower coupling and increase cohesion, or does it do the opposite?

Many of the best codebases I've seen in over 30 years of professional experience were characterized not by clever abstractions, but by clarity. By code that could be read and understood without having to trace three levels of indirection first. By explicit structures that revealed at first glance what they did and why. That is the true art of software development, and in conclusion, one can say: abstraction is overrated, understanding is not.

(rme)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.