
This is a personal copy of a column in IEEE Software (Jan/Feb 2026). Republished with permission. (Illustration by Claudette Ocando Röhricht)
In the last column, we promised to return to one of the inevitable challenges that vibe coding leaves us with. At the AI Engineer World’s Fair 2025, OpenAI’s Sean Grove claimed that whoever writes the specification is now the programmer – since AI can take it from there. Sounds amazing, but what gets lost along the way? To explore this, I’m joined by Jan-Philipp Steghöfer, researcher at XITASO and longtime expert on traceability. We’re sharing our thoughts on abstractions, and why building without them is a recipe for collapse. – Markus Borg
Jan-Philipp Steghöfer, Markus Borg
Magnus is presenting PeopleSpace to his team (for full context, see the previous column). He vibe-coded it right after being inspired by the stand-up meeting that kicked off the day. It’s after lunch now and he has a high-fidelity prototype to show – built in just two hours using Lovable. The team is quite impressed with the result and the speed from idea to demo. But would they mistake the vibe-coded prototype for a deployable application? No. In this column, we explore why the sweet spot for vibe coding is prototyping. We also name the missing ingredient is that separates vibe-coded prototypes from production-ready software.
Valdis Berzins and Lucia Luqi wrote in their 1991 book: “Abstraction is one of the primary intellectual tools we have for managing complexity in software systems.” [1] More than 30 years later, software projects have become orders of magnitude more complex and abstraction is more important than ever. This is acknowledged in teaching software engineering [2], deeply embedded in how practitioners work [3], and substantiated by leading thinkers in the field [4].
Any meaningfully complex software system is built around several layers of abstraction. The problem space is captured in requirements which are refined from high-level user requirements into fine-grained system and software requirements. Design artifacts describe how the requirements can be fulfilled by an architecture, i.e., structure and behavior. The high-level architecture is in turn refined into components, which have their own architecture. Code is written to implement the architecture and tests are used to check whether this code fulfills the original requirements. All of these steps require abstractions and all of these artifacts are connected to each other via relationships that describe how to get from one artifact to the other. If these relationships are complete, we talk about end-to-end traceability.
The New Code?
Quickly generating code is certainly tantalizing, and tool providers are eager advocates. At the AI Engineer World’s Fair 2025 in San Francisco, Sean Grove from OpenAI gave the keynote “The New Code.” As requirements engineers, there are several aspects of his talk we wholeheartedly agree with. Code is only a fraction of a programmer’s value. The larger share lies in structured communication. Specifications are important as persistent artifacts – they capture intentions, align stakeholders and support communication. As such they should receive utmost attention and be treated accordingly. So far, requirements engineering (RE) is with him all the way!
The continuation from this point is where we hit the brakes. Grove’s provocative claim is that whoever writes the spec is now the programmer. Why? Because AI can generate both the code and other downstream artifacts, such as test cases and documentation. Yet in this vision, at least in the current instantiation of the technology, the AI skips over the intermediate abstractions that make large-scale engineering possible.
Crucially, an engineer does not go from requirements to the code directly. A requirement is transformed several times and traced through these artifacts to create a full chain through these refinements to the code. An extremely high-level abstraction (the requirement) is transformed into something very concrete (the source code) via artifacts that reduce the level of abstraction. These refinements provide crucial feedback on the requirements and on the feasibility of finding a solution to them. AI can surely accelerate parts of the journey across layers, but it does not remove the task of figuring out what to build. This remains fundamentally human.
Why then does today’s AI hype promise production-ready code straight from requirements, as if decades of carefully layered abstractions could simply be skipped? Aren’t we breaking the layer cake of abstractions that forms the foundation of software engineering in irreparable ways, as shown in the figure.

The layer cake of abstractions in software engineering. AI-assisted coding in many cases tries to avoid the middle layers of abstraction, leading to cracks in the cake that come tumbling down at the slightest prodding. (Illustration by Claudette Ocando Röhricht)
NLP’s Folly
As a related example for the problems caused by crossing abstraction levels in software and requirements engineering, let us talk about trace link recovery. This is the activity where we hope that an automated system will be able to identify relationships between artifacts that humans are not able or too lazy to see. A very recent, very well-written paper by Tobias Hey and colleagues [5] shows how state-of-the-art LLM-based trace recovery stacks up against “classic” natural language processing (NLP).
Yes, LLMs provide an improvement. But it is minuscule. Even though the first technically advanced attempts to use NLP for trace recovery were published more than 20 years ago by Giulio Antoniol and colleagues, modern approaches still struggle to achieve results that are meaningful, even with the relatively simplistic academic evaluation datasets out there.
What does “meaningful” mean in this context? For practitioners, it is not practical to review hundreds of trace link candidates generated by an automated system. Especially if the majority of these candidates are obviously incorrect. Meaningful for a practitioner means that out of 100 link candidates, maybe 3 or 4 are not actual links. And out of the true links, only a few are missed. F1 scores around 0.6 do not constitute “meaningful” in this sense. One practitioner from the automotive industry once told us “I’d rather have no trace links than even one incorrect one.” [6]. So the “approximate correctness” that LLMs offer, resembling what you see in Google’s new “AI summary” feature, is not sufficient in critical engineering domains.
The scientific literature misses this discussion almost entirely. What is needed is a reckoning: will these tiny incremental results ever make a real dent? We believe the answer is a resounding no. Because these systems aim to bridge across abstraction layers based purely on the tenuous semantic relationships embedded in the NLP models, they will never be able to achieve meaningful results. The folly of this line of NLP research is that the minute improvements continue to be enough to get papers published.
Vibe Coding Ourselves Into Maintainability Hell
While trace link recovery might be a niche field, code generation is not. According to some thought leaders such as NVidia’s CEO Jensen Huang, to OpenAI’s aforementioned Sean Grove or SourceGraph’s Steve Yegge, code generation will make programmers as we know them obsolete in the next couple of years. Luckily, we can all become prompt engineers or maybe even context engineers instead – “professionals mastering the art of providing all the context for the task to be plausibly solvable by an LLM.” But what we see from GenAI coding assistants at the moment is nowhere close to where we need to be to really replace programmers and software engineers for complex problems. Contrary to popular belief, programmers do not spend their days coding iterations of quicksort or reinventing the towers of Hanoi – they build highly complex, interconnected systems of interdependent software components. And to describe these highly complex, interconnected systems of interdependent software components they rely on, you guessed it, abstractions (see also Philipp Schmid’s blog post).
what we see from GenAI coding assistants at the moment is nowhere close to where
we need to be to really replace programmers and software engineers for complex problems
Generating a snippet of code for a simple functionality is complicated enough (as witnessed by the fact that experienced programmers use the generated code as a starting point for their own implementation [7] rather than committing it), but generating large chunks of code or refactoring an existing code base is a different matter. While GitHub Copilot Workspaces, Loveable, and Atlassian’s HULA claim to be able to go from an “intent” or a rough requirement description to running code, their true test will be in using them with large and complex existing code bases. Crucially, these tools currently operate under the assumption that enough relevant information is contained in the code.
This is an illusion. Domain knowledge, design rationales, risk management, requirements, etc. are not and will never be captured in the source code. Instead, they are hidden in layers of documents that provide the levels of abstraction which help us go from a need via requirements, analysis, design and so on to a running and valuable software system. These abstractions capture the knowledge about the system and its domain – including all of the unintuitive and obscure edge cases that a crafty requirements engineer conjured out of domain expert interviews and log files of crashing software. Since the automated tools currently ignore these artifacts, they will not generate code that adheres to the collected knowledge of the engineers behind the system and the implicit knowledge of its users. That does not mean that the code will not work – at least in the beginning, at least for some cases – but it means that over time, the code base will deteriorate to the point where it becomes unmaintainable. The technical debt accumulates and will make progress impossible.
Abstractions in Context
Luckily, all is not hopeless despair. Developers of AI assistants are aware that context is king, and that continuously increasing context window sizes is no panacea. AI should not skip abstractions. Instead, our existing tooling should expose these structures so that AI-powered solutions can consume and modify them.
Over the last year, we have seen many AI assistants turn increasingly agentic. While there is inflation in this term, it generally refers to giving tools more operational autonomy. To do that properly, they need access to the same information human engineers rely on. That is, more context to understand the bigger picture. Abstractions are, by definition, information-efficient ways to squeeze the most important bits and pieces into limited context windows.
One of the most popular ways of sharing information with AI agents these days is through the Model Context Protocol (MCP), an open standard from Anthropic. By shipping solutions with an MCP server that provides structured output from agent requests, companies hope to enable open AI innovation based on the information their tools contain. Example tools that now come with an MCP server include JetBrains IDEs, GitHub, and Grafana. At the time of this writing, we’re close to releasing it for CodeScene as well – after several requests from users eager to build on it.
Whether MCP or other solutions dominate in the future, the software industry must ensure that data sources containing requirements, architectural abstractions, and design artifacts are made accessible to AI assistants. Substantial engineering knowledge is invested in creating meaningful abstractions. Disregarding them is wasteful and will doom any attempts to go directly from requirements to code.
In the meantime, the ability to create meaningful abstractions that still capture the complexity of a modern production- ready software system remains a uniquely human one. Our context windows are much larger than those of LLMs (for now) and we have developed excellent cognitive tools to deal with complexity, including abstractions, that AI is so far not able to replicate.
Abstractions remain the lifeblood of software engineering – in a way, abstractions are all you need. It all starts with requirements which are part of the problem domain and need to be translated, step by step, into the solution domain by creating suitable abstractions and refinements on the way. The attempt to skip this step has not proven successful.
The next step in the evolution of AI for RE will be in the acknowledgment that complex software will never only be requirements and code – it will always be a layer cake of abstractions and refinements.
References
- [1] V. Berzins and L. Luqi, Software Engineering with Abstractions. Addison-Wesley Longman Publishing Co., Inc., 1991.
- [2] M. Dorodchi, N. Dehbozorgi, M. Fallahian, and S. Pouriyeh, “Teaching software engineering using abstraction through modeling.” Informatics in Education, vol. 20, no. 4, 2021.
- [3] M. Petre, “Insights From Expert Software Design Practice,” in Proc. of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2009, pp. 233–242.
- [4] N. Bencomo, J. Cabot, M. Chechik, B. H. Cheng, B. Combemale, S. Zschaler et al., “Abstraction Engineering,” arXiv preprint arXiv:2408.14074, 2024.
- [5] T. Hey, D. Fuchß, J. Keim, and A. Koziolek, “Requirements traceability link recovery via retrieval-augmented generation,” in Requirements Engineering: Foundation for Software Quality – 31st International Working Conference, REFSQ 2025, Barcelona, Spain, April 7-10, 2025, Proceedings, ser. Lecture Notes in Computer Science, A. Hess and A. Susi, Eds., vol. 15588. Springer, 2025, pp. 381–397.
- [6] S. Maro, J.-P. Steghöfer, and M. Staron, “Software traceability in the automotive domain: Challenges and solutions,” Journal of Systems and Software, vol. 141, pp. 85–110, 2018.
- [7] A. Ziegler, E. Kalliamvakou, X. A. Li, A. Rice, D. Rifkin, S. Simister, G. Sittampalam, and E. Aftandilian, “Measuring GitHub Copilot’s impact on productivity,” Commun. ACM, vol. 67, no. 3, p. 54–63, Feb. 2024.


