Software Development in 2026

The field of software development is evolving. Just like it always has. You read a lot of articles about it, and there are multiple terms for it: LLM-driven development, vibecoding, agentic engineering. In this post, I summarize my takes on it: my experience with LLM-driven development, what impact I think it will have, and whether that is a good thing.

My programming history #

I have been ~~writing code~~ tradcoding for eighteen years now. When I started to learn programming, I was twelve years old, armed with a book called PHP 3 and MySQL (or something like that, I don’t remember it specifically). I remember reading this book: I had very little understanding of how software engineering worked at that time. But it was something unique and magical to me. It was this little peek into how systems worked. How data is stored. How tools and websites that you interact with daily worked under the hood. It was this magical moment of grasping that I, too, could build things. Maybe not right away, but if I dedicate enough time to the pursuit of building things, I could learn it. It was not magic anymore, it was science and craftsmanship.

Most of my journey in software engineering has been driven by curiosity. I never learned anything because I thought it would be a good avenue for a job. Instead, I learned things because I just wanted to understand them. I taught myself new programming languages, because I read about them and I was curious about what it would feel like to work with them. Which patterns the language embraces. Which new classes of problems I could solve more easily with them.

That journey took me through a lot of languages. Starting with PHP, Python, Ruby, Clojure, C++, C, Java, a bit of Perl, Lua, Crystal, Rust, and probably a few more that I am forgetting. I do not claim mastery of any of them, the vast majority of languages I have only explored on a surface level. I would say that Ruby, C, and Rust were the languages that had the most influence on me.

Ruby taught me what a clean, elegant and efficient language looks like. I can probably express more in a single line of Ruby than I can in any other language. C taught me how computers work: it is elegant in its simplicity, unforgiving, and brutally honest. It does not hide anything from you. And because it does not come with any practical data structures, you end up implementing what you need (which teaches you a lot). I would not use C for any real-world projects, but the experience of using it, writing pet projects in it and working on real-world C codebases was very insightful.

Rust is a special one to me. When I started learning and using Rust in 2015, that was just around the 1.0 release, it was a very niche language. I did not learn it because of the ecosystem, or because of its popularity. I learned it because I liked the design decisions it makes. In my mind, Rust is as close to perfect in terms of languages as you can get. It combines a lot of attributes of languages that I know and love. I find Rust to be very expressive: it is not quite as expressive as Ruby, but close. You can build great abstractions with it. Rust is similar to C, in that it teaches you a lot about how computers work. But unlike C, it does that teaching upfront: C happily lets you write incorrect code, which you either don’t spot until you get strange, hard to reproduce crashes months later, or you do spot and you spend days trying to figure out what the root cause is. Rust forces you to really understand what you want to do before you write any code.

My tooling history #

Some of the tools that I learned to use quite early on (in my teenage years) were vim, git, and tmux. I remember that it was painful to learn them. Using git is awesome, and the command-line interface is actually quite well-designed. But you go through a bit of a journey. At first, you just memorize to use it in your workflow, to make commits, to write commit messages. Over time, you are exposed to more complicated operations. Merge conflicts, rebasing, understanding diffs of three-way merges. At some point, you become curious about how git actually works. You understand its data model: commit objects, tree objects, refs. I think understanding git on a deep level is an investment that has paid off. Git is not going away, understanding how to use it effectively is a skill any software engineer needs.

Similarly, learning vim was hard. I have a vim configuration that I don’t ever change (because it works) and I have a grasp of a handful of commands. But even those few vim commands mean that using vim is more efficient for me than the vast majority of IDEs. I have tried various IDEs of the last two decades: from Atom, to VS Code, to Sublime Text (if you can call that an ‘IDE’), to Zed. Zed is maybe the only non-terminal-based tool that I am happy to use.

The magic of investing in learning tooling is that it makes you more efficient. All of these tools can be combined. If you’re a vim user, you’re free to try any new niche language: you don’t become dependent on your IDE supporting a language. You don’t need buttons to rename things, you learn how to do that with sed. You’re not dependent on your IDE bootstrapping a project for you: you understand exactly what each file does, because you created it. It gives you a deep level of understanding, a kind of freedom.

My LLM history #

It was late 2022 when ChatGPT was released. I remember having a lot of discussions about it with my coworkers. There was a bit of a split: some people were really impressed with it. I remember that I was not very impressed with it.

For many people, LLMs like the ChatGPT and the models that followed were a real unblocker. You could ask them for help with many things. Writing shell scripts, writing pieces of code.

I think the reason that the initial ChatGPT was very un-exciting for me was that I had always been a bit of a generalist. I did not need ChatGPT’s help with anything. When I worked on something unfamiliar, I preferred learning to do it myself (so I understood it). I never really used ChatGPT much for any coding-shaped tasks, so I can’t comment on it too well.

I think it was 2024 to 2025 when we started to get some better models. I remember that I was using the Zed editor at the time, and it allowed me to use models from several providers. I tried them out. My experiences were quite mixed. For simple tasks, they did work. But when it came to programming, most of the time it would be painful for me, because I would end up watching the LLM make rookie mistakes. And the time it took me to clean up those mistakes is higher than the time it saved me to use them in the first place, so not really a win.

I think that changed in 2026. At least, for me. I had some conversations with people who pointed out that LLMs had gone a long way since. I decided that I would give them a shot again. I tried them through various means: the built-in LLM support in Zed, OpenCode, and some “agentic coding agents” from commercial providers.

And I have to say, I was quite impressed. The current models are able to effortlessly write correct Rust code. They follow relatively good practices: write clean unit tests. Their writing skills are still a bit off. They can use punctuation “correctly”: em-dashes, semicolons. They use the correct Unicode characters (for ellipsis, various symbols like arrows). But what they don’t understand is that technical writing is different from writing novels. You rarely see semicolons, em-dashes or Unicode symbols in technical writing. And not because they are incorrect, but because most people don’t know how to produce them. If you use macOS, you have access to most of these symbols right from your keyboard: alt+- gives you – (en-dash), alt+shift+- gives you — (em-dash), alt+; gives you … (ellipsis). But many people either use Windows (which has a more limited way of producing special characters, you’d have to type something like alt+0151 to get a character and memorize the code), or they are simply unaware (for the record: Linux can be configured to have the same shortcuts as macOS, if you use the US Macintosh layout, and I think even the default configuration supports it now, but I am not sure. I always configure my machines to the us-mac layout for this reason).

Correction: in 2025 — 40 years after the first release of Windows — Microsoft has added a native way to type en-dashes and em-dashes. Windows is still lacking native support for many other special characters supported by other operating systems, but at least those can be inserted now.

So it’s not that LLMs are incorrect — it’s more that they are correct in places where we are not used to it. We just don’t use “writing best-practices” in technical writing, we avoid semicolons, em-dashes.

And that’s not just a typographical issue: for example, when you receive a pull request, the accuracy and depth is kind of a proxy for how much thought and effort someone has put into a pull request. If someone writes a one-page technical explainer of a change, you’d assume that this change is likely correct and well-tested. With LLMs, this correlation doesn’t really exist anymore, leading to the advent of the term LLM slop. You may receive contributions that are meticulously explained and implemented, but testing it fails spectacularly. Simon Willison makes the same observation: a hundred commits, a great README and a full test suite used to signal care, now anyone can generate that in half an hour.

AI Psychosis #

I tried using LLM-driven development approaches. I gave them a real shot. And I think I fell into the same traps everyone does. There’s a term for it: AI Psychosis. You may want to watch Recovering from AI Psychosis (video) for some context on it, or read Siddhart’s article.

I use the term psychosis loosely and not in a medically-correct fashion. Some people, apparently, can get actual psychotic symptoms. What I am referring to is more of an irrational response.

What happens is this: you discover a new tool. A tool that can, relatively autonomously, write software. You have a backlog of ideas that you always wanted to do, but never could, due to a lack of time (or urgency). Now that you have this new tool, you discover that you can actually work on your backlog. You find that the limiting factor is your time, because you need to sit there, accept permission prompts. The result is that you try to find ways to work around it. For example:

Getting your “agents” to run autonomously, for example on a VM, so that they can run over night. This is the free-labour fantasy behind tools like OpenClaw.
Multi-agent orchestration systems, where multiple agents get “roles” and communication primitives.
Working on multiple projects in parallel, giving direction on what tasks to work on next, but barely reviewing the code. (I have done this.)
De-emphasizing human biological needs such as sleep in order to be more “productive”. Adam explains this happening to him in the Recovering from AI Psychosis video linked above.

Now, this may sound like it is quite shocking to you. However, I would argue that this is a natural human reaction to discovering a new tool and being excited about it. If I bought something that I always wanted to have — my first motorcycle, for example — I would be equally as excited about it, and try to spend as much time as I can using it.

This state of “psychosis” usually lasts for a bounded amount of time. Eventually, you realize a few things. Lines of code written is not productivity: it is liability. After you have spent a considerable amount of time letting “agents” work on your codebase, and you look at the code, you inevitably find some issues with it. Not that the code is substantially wrong, more things that humans (with less development experience) are prone to as well. Badly written abstractions, with a lot of functionality layered on them. Duplicated code. Poor architecture choices. Documentation that is outdated, overly verbose.

And I think there is a bit of a fork: if you are already a good engineer, then when you work on a project, you have a vision in your mind of what a good outcome looks like. You know what architecture you need, what abstractions you need, and how they are supposed to work. You know what good documentation looks like. And with this, the “magic” of LLMs goes away a bit: you realize that they can produce code at superhuman speed, but they still need direction, and input, and they can and will get things wrong.

I think the risk is that if you are less experienced, LLMs may feel like magic to you. And because of your lack of experience, that magic never goes away. That can be a problem: it means that you have over-confidence in the quality of the codebase, and it can also lead to something of a Dunning-Kruger effect, where you by have over-confidence in yourself.

Most of what I have used LLMs on was building prototypes for projects. The mentality there is a bit different from working in real-world codebases with established patterns: your goal is to move fast, but there are also a lot of design decisions that you need to make, because the LLM can’t build on existing design decisions and patterns in the codebase.

The pattern that works for me on those kinds of codebases is a mixture between implementation sprints (where I spend some sessions on decomping features I need with an LLM, to try to define them as accurately as possible, and then let the LLM implement them relatively autonomously), and slower code-review sessions (where I go through the entire codebase, make notes of anything that looks wrong, needs refactoring, has poor documentation, needs tests). Simon Willison’s Agentic Engineering Patterns guide is a good structured treatment of this way of working.

What I find is that if you have a project and you know exactly what the outcome should look like, LLMs are a great tool to help you get there faster. So LLMs are really productive for code that I already am a domain expert for, because I can spot things that are not quite right. LLMs also help me write code for things that I am not a domain expert on (like, frontend code). But I don’t trust that code as much, because I lack the experience to review it. They help making the development loop tighter: if I have an idea for how to refactor something, I can just ask the LLM to do the refactor in an hour, and observe if the result is substantially better than the previous state.

I think the speed at which LLMs produce code is a bit misleading, because practically your project is still limited by your understanding, by you building your mental model to reflect it. And that is not a limitation: that is a feature. A project that does not have a single person keeping track of it and understanding it is directionless. The best projects are not the ones that said yes to everything, they are the ones that have a coherent vision and know when to say no. LLMs will rarely say no to something.

Impact of LLMs on software engineering #

In my experience, LLMs are great at local optimization. If you ask them to implement a feature, they will do so. They will also test it. What they lack is a coherent understanding of a project. For example, they will happily duplicate algorithms or types, just because they don’t know something was implemented somewhere already. They will build on poor abstractions, because their goal is to “implement the feature” and not “make the codebase better” (although, you can ask them to do that explicitly, but that involves needing to understand and judge if an improvement they recommend is actually an improvement, or if there is some use-case for something that they don’t understand yet, because it is not implemented nor documented).

So, in my experience, in order to use LLMs, you still need to have good software development skills. You need to be the person that has the domain knowledge, the global understanding of the project, of the direction, of the architecture. LLMs are awesome at “filling in the gaps”. LLMs can also make pretty decent architectural recommendations — but in the end, it is you that is responsible for maintaining the project, so the architecture needs to be something that fits well into your experience level, deployment method, etc.

People raise the point that code is becoming “worthless”, because LLMs can produce it quickly. And I would say, yes that is true. During my “AI psychosis” phase (if you want to call it that, I prefer to call it tool exploration phase), I spent a considerable amount of time re-implementing git-lfs in Rust. The LLM was able to do that effortlessly. And the reason the LLM could do it effortlessly is because git-lfs is well-specified, and it has an extensive testing suite. In some ways, the code is almost just a by-product of the authors of git-lfs coming up with a coherent specification, testing it in practice, and encoding all of the ways it can fail in the test suite. So, in my opinion, the value of projects lies not in their implementation, but in their specification and adoption (in my opinion, the test suite is part of the specification).

And I am not alone in that assessment. The SQLite project came to a similar conclusion. They release the source code of SQLite under the public domain. It’s free, you can do whatever you want with it. But the one thing that is not free is their extensive test suite, TH3. You can purchase that, for an undisclosed amount. And that is, in my opinion, genius. Anyone can use SQLite. But if you need to extend it with something, you need a way to make sure your changes are correct. The test suite encodes decades of the SQLite team figuring out errors, writing tests for it. That is non-trivial work, and not something you can easily reverse-engineer from the SQLite codebase. The codebase tells you how it’s implemented, the test-suite tells you why it is implemented that way.

Are LLMs bad for software engineering? I would argue no, they are not. In my journey of learning programming, I have always done it by trying things out that I don’t understand yet. It is easy to get stuck, because you may not have people that can explain concepts to you. I built a lot of random projects, and I wrote a lot of bad code. I think that for less experienced people, LLMs can be really useful, if they are used correctly. LLMs can help you fill in code or ideas that you don’t understand yet. But the thing you need to keep in mind is that the product you are building is not the codebase, but your understanding of the codebase. When that understanding erodes, Margaret-Anne Storey calls what’s left behind Cognitive Debt, debt that lives in developers’ minds rather than in the code, building on Peter Naur’s idea that a program is really a theory held by the people who built it. If I want to determine how competent someone is, I don’t need to look at the projects they have built, I need to ask why did you build it like this?

Are LLMs a threat to software engineering? I think that is hard to answer definitively. I have two opinions on how it impacts software engineering. First, LLMs are trained on code that we have already produced. So, LLMs will, unless redirected, repeat what we’ve done before. Most of the influential tools and projects that we use are successful because of one of two reasons:

Someone solved an existing problem in a novel way. For example git: it solves code distribution and collaborative development by building a content-addressable data structure (basically a blockchain of commits, and merkle-trees for directory trees). As far as I am aware, nobody had done this before. It is novel and creative.
Someone solved an existing problem better. By taking the time to think it through, trying it, improving it. Optimizing it. Perfecting it, being opinionated in the pursuit of perfection. vim and tmux fall into this category for me.

In my understanding and experience, LLMs do not help you with creativity. Any successful project will always require human input, trade-offs, decision-making, a coherent vision that is persisted and updated over the course of the project.

Human vs LLM skill levels — My mental model of LLM skills. They have a broad, generalist understanding of a variety of individual topics, higher than the average human. But human domain knowledge still outperforms that of LLMs.

What LLMs can help us with is “unlock” new ideas. LLMs have a base-level understanding of a lot of different things: they are generalists. They are not perfect at any individual measure. But, let’s say you are an expert in cryptography, if you want to build a cryptographic library then that is the expertise that you bring. An LLM may bring expertise in writing documentation, making a website for your project, writing CI configuration for it.

And the final point that I want to make, to tie this post together: I have always used a terminal-driven workflow. I think there is a lot of value in understanding all of the tools, to understand how to compose them. I am by no means perfect at it, but good enough to do what I need to do. And in my career, especially early in university I have encountered a lot of people who thought that was stupid, or did not see the value in it. They were very tied to their IDEs — Visual Studio, or TurboPascal, or whatever they were using. The advent of LLMs has shown me one thing, which is just how powerful the terminal is. Most of the “agentic engineering” tools are based natively in the terminal. They interact with your machine (and your codebase) using command-line tools. And they do so very efficiently: they can write perl one-liners that blow my Ruby skills out of the water in a few seconds. To me, LLMs are the ultimate validation that the terminal, and the UNIX philosophy, which was designed by some very smart people multiple decades ago, is here to stay. And in some ways, there is a certain amount of irony: we have some of the most modern technology (LLMs), that use one of the oldest technologies that we still kept in modern computing (terminal-based workflows).

That is a good summary of the field of software engineering. It is always evolving. And, generally speaking, that is a good thing. We adapt, and we learn. We keep the good ideas, and we replace the bad ones.

How I use LLMs as a staff engineer in 2026 by Sean Goedecke #

Sean updates his 2025 post with a simple verdict: agents have crossed from occasional and suspicious to constant, with light supervision. He now opens nearly every change by handing it to an agent and doing a single review pass, and throws every bug at one (it diagnoses ~80% unaided) — though his fourteen- session bug hunt is a nice reminder that human expertise is what narrows the search space. The part worth keeping is where he draws the line: he still hand-writes PR descriptions, ADRs and messages, both because LLMs bury the core idea and to signal to reviewers that a human actually read the diff.

Agentic Coding is a Trap by Lars Faye #

Lars takes aim at the “human as orchestrator, AI writes the code” model and argues it is self-undermining: supervising an agent well demands exactly the coding and critical-thinking skills that heavy AI use is measurably eroding — what he calls the paradox of supervision. He is not refusing to use them, but demotes them to writing specs, research and ad-hoc generation, while staying hands-on in implementation and never generating more than he can review in one sitting. His sharpest point is that for many developers, writing the code is the thinking, so handing it off doesn’t move you up an abstraction layer — it just widens the gap between you and what you ship.

The use of coding agents is actively diminishing the very skills needed to effectively manage the coding agents.

I don’t necessarily agree with Lars’ point — my article argues for the “humans as orchestrator” model that his article criticises, but nevertheless a good read.

Vibe coding and agentic engineering are getting closer than I’d like by Simon Willison #

Simon (of Datasette fame) watches his own tidy distinction collapse: he used to separate vibe coding (not looking at the code, fine for low-stakes personal tools) from agentic engineering (professional, production-grade work backed by 25 years of experience), but as agents get reliable he’s no longer reviewing every line even in production, and the two have begun to blur. He treats an agent like a trusted internal team handing him a black-box service — use the docs, crack open the code only when something breaks — while admitting the discomfort that, unlike a team, an agent has no professional reputation to stake. His most useful observation is about evaluation: a hundred commits, a great README and full tests used to signal care, but now anyone can generate that in half an hour, so what he’s come to value instead is whether someone has actually used the thing.

We’re Not Building AI Features for the Money by Conrad Irwin #

They make a great point:

On the flip side, I know that well-designed systems still require a human in the driver’s seat. Products, and the code that powers them, exist to help humans achieve their goals; and building them correctly requires human values, human judgment and a considerable amount of empathy. Agents can write the code, but deciding what to build, and how to build it still takes human craft and expertise.

Agentic Coding is Burning Me Out by Siddhart Sundharam #

Siddhart describes the phenomenon that I labelled as AI Psychosis, explaining the decision fatigue that comes with it, and that agentic loops need to be slowed down in order to remain productive.

The familiar pacing of software development has been completely compressed by agentic coding because you no longer have those routine stretches of just wiring things together to catch your breath. Writing code by hand, or trad coding as Twitter likes to call it, forced me to respect the natural ebb and flow of the problem solving process and get into a much more steady, sustainable rhythm.

Why I Don’t Vibe Code by Jacob Harris #

Jacob is sternly against the use of AI. He makes two interesting points:

He talks about accidental versus essential complexity (references Brooks’ No Silver Bullet). Accidental complexity can be solved through better tooling and abstractions, essential complexity is inherent to building software. Some things are just hard.

However, even as the better tooling has diminished accidental complexity, essential complexity still remains. There still is the complicated work of designing our abstractions and systems the right way, one that is elegant, clear and maintainable. And that complexity isn’t going anywhere. This type of work takes skill and experience and wisdom hard-won from system failures past. And, I’m not sure if LLM’s fancy autocomplete approach works so well with this type of complexity, which often isn’t so straightforward to solve.

He also makes the point that friction is a gift. He explains that friction, which LLMs can bulldoze through, is actually helpful in his process, because it forces him to actually understand something.

When I am first learning a new language or framework, I struggle with friction to do even the most basic tasks. It sucks! And when am working with a new and unfamiliar code repository or data source, I need to set aside hours to scrutinize it. I often find myself doing a close reading, pulling up specific files to look over line by line until I understand their context and the choices their developers made. I know I could just ask an LLM to summarize the project for me and save myself the time, but I’ve found I need this process to really marinate in the code. I need it to not just understand the choices the developers made, but why they made them and how they reflect the constraint or idioms of the language they are using. I learn by failing, and if the LLM takes that work away from me, I won’t really understand what I’m doing.

Agentic Coding by Alexey Milovidov #

Alexey, ClickHouse’s CTO, makes the opposite case to Jacob: a comprehensive, example-heavy report on running coding agents in production against a large, mature C++ codebase. It is the most thorough practitioner account I have read, and a good counterweight to the skeptics above — though worth remembering he writes as a vendor, partly to recruit. He lands on a point I keep coming back to, that agents reward engineers who already know what they want:

AI is a multiplier - good engineers will be good with AI, mediocre engineers will feel no difference, and bad engineers will do more harm.

Your Code is Worthless by Nathaniel Fishel #

Nathaniel makes the same point as I make: that your code is worthless. He summarizes it well:

We must return to the fundamental truth The source code is not the product. The product is the Outcome the user achieves. The code is merely the expensive, high-maintenance machinery required to deliver that outcome. If you can deliver a $1,000,000 outcome with 10 lines of code, you are a hero. If you deliver that same outcome with 37,000 lines, you have just created a $1,000,000 liability.

AI Hot Takes From A Platform Engineer / SRE by Lian Yuanlin #

A grab-bag of hot takes of using LLM-driven tooling and workflows for Site Reliability Engineering. A few gems that I agree with:

If AI coding companies truly think that their tools are the absolute shizzle, why is their commandline tool written in Javascript? What has everyone done to Google and Anthropic to deserve having to run a Node.js runtime for a terminal cli? Should everything not be written in Golang or Rust to be as lightweight as possible on the users’ machines?

We live in these interesting times where one of the most productive stacks (for me) lives in the terminal. However, my agentic coding agent is written in TypeScript, runs inside bun, and eats up 400 MB of RAM (more than my browser). JavaScript-based developer tooling is the new “electron apps”.

Given that LLMs with RAG make searching through technical documentation far easier, a likely outcome for enterprise software is the gatekeeping of documentation. This is to prevent the common SWE from easily becoming a domain expert and replacing their value proposition.

This is actually a similar point I made about specifications and tests being the valuable artifact of a software project, not the code. But Lian extends it to documentation. Not sure I agree with this one, but it’s an interesting idea.

OpenClaw and the Dream of Free Labour by “The Daemon” #

An article critiquing OpenClaw, a piece of software that allows you to run autonomous agents. One line that stuck with me:

The problem begins when continuous runtime is mistaken for continuous judgement.

I think this fits in quite well with my explanation of the AI Psychosis phenomenon: you discover LLM agents, you feel that magic. Your initial drive is to maximize their usage: you want to get the most out of them, so you want to run them with as little supervision as possible. OpenClaw enables that. But doing more of something with less supervision is not exactly producing better outcomes. The article writes:

Not that software should never run continuously, but that many of the things people most want from OpenClaw are not improved merely by being done for longer. Marketing, judgement, product sense, trust and timing do not simply become better because a Mac mini remained awake.

Can agentic coding raise the quality bar? by Luca Palmieri #

Palmieri (author of Zero to Production in Rust) makes the most constructive argument in this list: where the mainstream treats agentic coding as a throughput play, he argues it can raise the quality bar in systems where quality is the point. His reframe is that code used to be expensive but agent code is cheap, so verification becomes the dominant cost — and there’s a quadrant of work (time-consuming, cheaply verified, low blast radius) where agents have no real downside. That makes a class of quality work newly economical: building the backlog tooling nobody bothered with, prototyping several designs and measuring instead of arguing on a whiteboard, generating tedious low-value-per-line abstractions, and paying off small tech debt that tightens the verification loop. Notably, he rejects spec-driven “waterfall” in favour of iterative prototyping, where each failed agent attempt maps a real constraint in the system.

It raises the bar on engineering discipline: organizations and systems where quality matters will invest more in verification, tooling, and feedback loops to extract real value from agentic workflows

Agentic AI and the Mythical Man Month by Murat #

The Mythical Man-Month postulates that adding more people onto a late project makes it slower, rather than faster. This is due to coordination complexity, which scales in O(N^2). This article discusses a paper, “Self-Defining Systems”, which postulates that it is possible, with LLM agents, to circumvent this. Because LLMs can explore many different options in the solution space in parallel. However, the author disagrees with the paper’s take, which rests on the assumption that software engineering is an embarrassingly parallel task.

Agents may ingest 100,000 lines of code instantly, but reading tokens is not the same as understanding the causal chain of changes across the system. As architectures grow (especially non-monolithic ones) the compute required to simulate downstream effects of a line change explodes exponentially. An agent may “see” the entire codebase, but without common knowledge of how, for example, a batcher change propagates through kernel fusion, regressions appear. Multi-agent systems do not escape Brooks’ Law, they simply hit the wall of state-space explosion faster than humans do.

How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt by Margaret-Anne Storey #

The closest piece here to my own argument. It distinguishes technical debt (which lives in the code) from cognitive debt (which lives in developers’ minds): even when AI produces clean, readable code, the humans can lose the plot — no longer understanding what the program is meant to do, how their intentions were implemented, or how to change it. It grounds this in Peter Naur’s classic idea that a program is not its source code but a theory held in the minds of its developers, and illustrates it with a student team that stalled at week seven — not from messy code, but because no one could explain why the design decisions had been made. The prescription is to treat understanding as a first-class deliverable: require that at least one human fully grasps each AI change before it ships, and document why, not just what.

Technical debt lives in the code; cognitive debt lives in developers’ minds