When a project becomes large enough, the main bottleneck is no longer coding speed. It is cognitive load.
Recently, I used a small multi-agent workflow to restructure a software project while I was also preparing for exams. The goal was not just to “use AI to code,” but to divide different kinds of thinking into different roles so I would not have to carry the whole system in my head at once.
The role split
I used three agents with clearly separated responsibilities.
1. GPT-5.4 as project manager and architect
This role was responsible for:
- defining the restructuring direction
- identifying the real architectural problems
- converting broad goals into execution phases
- judging whether proposed plans were aligned or drifting
- deciding what should be frozen, removed, or postponed
- coordinating the output of the other agents
This was the highest-level role. It did not just produce code. It maintained the project’s internal logic and protected the long-term structure from short-term patching.
2. Claude as reviewer
This role was used as a reviewer rather than a builder.
Its job was to:
- read plans or code changes critically
- challenge architectural assumptions
- identify inconsistencies
- test whether the logic was truly unified or only superficially reorganized
This was important because a system can look cleaner while still preserving the same hidden mess. A reviewer role exists to detect that.
3. Composer 2 as scout
This role did not make decisions and did not modify code.
Its only job was to:
- inspect the codebase
- extract evidence
- map file responsibilities
- trace call chains
- compare behaviors across surfaces
- report facts without interpretation
This turned out to be extremely useful. A lot of architectural work fails because the main reasoning agent wastes energy just locating information. Offloading that search-and-report work reduced friction significantly.
Why this worked
The biggest gain was not “parallel coding.” It was separating cognitive modes.
There are at least three very different tasks in a restructuring project:
- deciding what the architecture should be
- checking what the code actually does
- validating whether the change truly achieved the intended result
If one agent does all three at once, it tends to blur planning, evidence gathering, and judgment together. That creates noise. By separating them, each thread became cleaner.
The scout gathered facts.
The reviewer challenged claims.
The manager/architect maintained global coherence.
That made the whole process much easier to control.
Dual-threading
The workflow also benefited from dual-threading.
While the main coding agent was working on a restructuring phase, another agent could inspect the codebase and collect evidence relevant to the next decision. This meant the project did not need to fully stop between phases.
The important part was that the two threads were not symmetric:
- one thread changed the system
- the other thread observed the system
That asymmetry matters. If both threads are changing things, the process becomes noisy. If one thread builds and the other verifies, the process stays legible.
Efficiency gains
The efficiency gain came from reducing switching cost.
Instead of doing this manually:
- inspect code
- form hypothesis
- verify hypothesis
- draft plan
- review plan
- compare with current state
- revise plan
- execute
I distributed the pipeline:
- scout handled inspection
- manager/architect handled synthesis and execution planning
- reviewer handled criticism and pressure-testing
This made iteration faster, but more importantly, it made iteration safer. The project was less likely to drift because every major claim could be checked against extracted evidence.
Cognitive load sharing
This was the most important part for me.
I was in an exam period. I did not want to sink into a software project and lose focus. So the workflow was designed to let me remain the final authority without having to become the active bottleneck in every step.
The system reduced my role to:
- setting the goal
- reviewing high-level direction
- approving or rejecting major moves
- deciding when to stop
That is a very different burden from manually tracking every file, every interaction inconsistency, and every hidden dependency.
In other words, the agents did not just “save time.” They absorbed different kinds of mental overhead.
What I learned
A good agent workflow is not just “one smart model doing everything.”
It works better when:
- roles are explicit
- evidence gathering is separated from judgment
- review is independent from implementation
- the human stays at the decision boundary, not inside every micro-step
- the process is designed to minimize cognitive context switching
The result is not merely faster execution. It is a more stable thinking system.
Final takeaway
The most effective use of agents in software work, at least for me, is not brute-force code generation. It is building a small operational structure around the project:
- one agent to inspect
- one agent to critique
- one agent to coordinate and architect
Once those roles are clear, the human no longer has to carry the entire project alone. That is where the real leverage begins.