AI as a mirror: Missing meaning in code and architecture
Code and architecture often fail to convey meaning understandably. Not only humans but also AI models fail due to the consequences.
(Image: Ole.CNX/Shutterstock.com)
- Nicolai Wolko
Programming has long been less about writing code and more about understanding systems. Every codebase is read far more often than it is extended. And that is precisely where a common structural deficit becomes apparent: The "why" behind decisions is usually invisible.
Anyone analyzing an existing system – whether human or AI – encounters classes and modules, but does not immediately grasp how they relate and what they represent. AI models reflect this problem particularly clearly: they recognize patterns, but not the reasoning. A recent overview study from 2025 concludes that a significant portion of errors in generated code are not syntactic, but rather stem from logical-semantic misunderstandings. Models follow existing structures. If the underlying meaning and derivation are not visible, they hallucinate – just as humans would have to guess or research.
This observation indicates that we have perfected abstraction but often underestimated understandability.
The Long Fight Against Mental Load
In retrospect, the history of software development appears as a continuous effort to reduce mental load. Assembler moved away from direct machine commands, high-level languages removed hardware details, and frameworks wrapped complexity in a few lines of code. With each step, the focus shifted from "how" to "what":
However, the relief is ambivalent. Abstractions save typing effort but create additional interpretive work. A function like calculateTotal() hides the implementation, but also the meaning: Which total? Which rules? Which domain? Which exceptions? A line of code becomes a mental springboard, forcing readers to reconstruct the invisible.
As software development matured, a new kind of difficulty came to the fore. Writing code became significantly easier, but the shared understanding of the underlying meaning became more challenging. The critical bottleneck shifted from technical implementation to semantics. In 2003, Eric Evans, in his book "Domain-Driven Design," consistently argued for the first time that the real challenge in software development lies in the meaning represented by the code (cf. Eric Evans: Domain-Driven Design: Tackling Complexity in the Heart of Software, Addison-Wesley, 2003).
Domain-driven Design (DDD) was a breakthrough because the concept shifted the discussion from implementation-driven structures to domain semantics. Concepts like Ubiquitous Language, Bounded Contexts, and Aggregates strengthen shared understanding of terms and processes. Thus, DDD makes a problem visible that persists to this day: meaning can be modeled, but in many systems, it remains only indirectly recognizable in the finished code.
Besides DDD, other architectural principles like Clean Code pursue the same core idea: making code understandable. They arose from the insight that technical structure is only part of the problem, and the real hurdle is understanding.
However, these concepts assume that the necessary domain understanding is already present. They create a structure that maps this knowledge but makes the underlying decisions in the code itself only partially visible.
From Research: How Mental Load Arises
Mental load arises wherever a system does not express its meaning, but leaves the understanding to the readers. This effect is well-researched.
Studies using functional magnetic resonance imaging (fMRI) have shown that code comprehension is associated with measurable cognitive load and that this load varies depending on the code's understandability. If orientation and context are missing, the mental effort increases: the brain has to invest more cognitive work to establish connections that are not immediately apparent in the code.
Seemingly trivialities can have measurable effects, as shown by another study where spelled-out word identifiers were understood about 19 percent faster than abbreviations or single letters.
In programming practice, mental load is particularly evident in three situations:
1. Unmarked Meaning Changes
Example: A process transforms from "Order" to "Booking" in the code without the change being visible. For the brain, this means a context switch. The semantic marker that makes the transition understandable is missing.
2. Implicit Rules
Example: A parameter can only be set in certain states. The system runs, but it does not express which assumption applies. Readers compensate for this through mental simulation.
3. Structure Without Semantic Orientation
Example: Technical layers separate processes, but not concepts. The brain follows the code, but not the domain logic. Orientation only emerges after several internal reconstruction steps.
The examples share a common pattern: the code carries syntactic structure, not semantic meaning.
AI as a Mirror
How does artificial intelligence (AI) fit into these considerations? The history of software development has made great progress from "how" to "what." AI is intended to complete this path by creating a natural language code interface. For now, however, there are still noticeable hurdles.
One of the most fascinating properties of Large Language Models (LLMs) is that they mirror the context of their prompt. A study from 2023 investigated the types of code errors LLMs make. Recurring error classes emerged: false assumptions and incorrect logical direction. In other words, not syntax errors, but indications of a missing decision trail in the context. Models reproduce structure but often miss meaning – much like humans often do when they stumble into a system without prior knowledge.
This effect becomes even more pronounced when readability is deliberately degraded. With obfuscated code, performance noticeably drops. Experienced developers can sometimes compensate in such cases with experience and analytical approaches, but they work more slowly, less confidently, and with a higher risk of error. It becomes very clear here: as readability decreases, performance drops – for both humans and AI models. Both rely on explicit cues to reliably infer meaning.