← Back to all episodes
AI-generatedMay 21, 20265:43

MyTeam: how a 10-agent firm ships features

Inside the Foundation → Integration → Autonomy lifecycle that runs ten specialised agents in parallel — the dependency graph that shapes the work, the trap of parallel-by-default, and what's non-negotiable in Autonomy.

Transcript

Episode summary

Most AI-development tooling treats "the model" as one generalist that holds the whole task in its head. We chose the opposite: a small team of specialised agents, each with a sharp role, that hand artefacts to each other along a directed acyclic graph.

The platform is MyTeam — ten agents, three lifecycle phases, one delivery engine. This episode walks through how the pieces fit and what we learned wiring them up.

Why ten specialised agents instead of one generalist

A single generalist gets slower the more context you cram into it. Around 100k tokens of project history, response latency creeps, instruction adherence sags, and the model starts forgetting constraints set hours earlier. A specialised agent with a ~4k-token role prompt and 10-20k of task-specific context outperforms a 100k-token generalist on every metric we tracked: latency, accuracy, instruction adherence, cost-per-decision.

The DAG is the architecture

The interesting design decision wasn't what each agent does — it was what each agent needs from the others before it can start. The agents form a directed acyclic graph; the scheduler is just a topological sort over that DAG. Acyclic matters: data only flows forward, which structurally prevents two agents from looping on the same disagreement.

The parallel-by-default trap

Early on, we ran Designer and Architect simultaneously on every feature. The Designer hallucinated UI mockups for strictly-backend features and Implementer wasted cycles reconciling irrelevant designs. The fix: the scheduler reads the first 200 words of Architect's output, classifies the feature, and dynamically prunes the DAG — skipping Designer entirely on backend features.

Autonomy phase is short but high-stakes

Deployer, Monitor, Iterator. A bug in Foundation is an annoyance; a bug here brings down production. We over-invested initially in Monitor catching errors and under-invested in Iterator deciding what to do about them. The Iterator agent never silently sits on uncertainty — it picks one of roll back / patch / human decision and pages a human if confidence isn't there.

Practical takeaways

  • Keep agent prompts tight. Sweet spot is under 1000 tokens of role instructions; anything more and focus blurs.
  • Force Implementer agents to read the actual API interface file before writing code. Eliminates hallucinated function signatures.
  • Pre-load downstream agents' tools during the upstream agent's last 30%. 15% wall-clock win in our pipeline, zero downside.
  • Optimising the latency of an agent that isn't on the critical path is wasted effort. The critical path dictates time-to-feature.

The companion product page is at /en/products/myteam.

Full transcript

Imagine handing like one completely overwhelmed developer your entire enterprise code base and just asking them to build a complex feature from memory. They'd absolutely drop something. Right. But I mean, that is exactly what we do when we stuff, you know, 100,000 tokens of project history into a single generalist AI model.

So today we are doing a deep dive into how a platform called MyTeam takes a totally different approach to shipping features. Yeah, they use a 10-agent architecture. Exactly. And our mission today is to unpack their foundation, integration, and autonomy lifecycle for you, the technical engineering leader.

Because it's fascinating. It really is. And the data behind this shift is actually pretty counterintuitive. We usually assume like more context equals better quality.

Yeah, bigger context windows are always the selling point. Right. But the sources show that a specialized agent given just a really sharp 4,000 token role prompt. Which is basically what, a 10-page instruction manual?

Exactly. Yeah. You give it that plus maybe 10 to 20,000 tokens of task specific context and it beats that massive generalist model across the board. Oh, wow.

Yeah. I mean, you get lower latency, higher accuracy and a much cheaper cost per decision. But the immediate question for me is coordination. You know, if we break this down into a foundation phase with an ideator validator and scoper Right.

And then hand it off to an integration phase with an architect designer, implementer, and tester. How do these specialised agents not just create total chaos Well, they structure the workflow using a directed acyclic graph or a DAG. They have a DAG. Yeah.

And the acyclic part is critical here because it means the data only ever flows forward. Ah, so no going backwards. Exactly. It structurally prevents two agents from getting stuck in this infinite loop of arguing over a piece of code.

The system maps exactly what each agent needs before it even wakes up. So the architect cannot start until the scoper completely signs off. But wait, if we have all these fast specialised agents, forcing them to wait their turn feels a bit inefficient, doesn't it? Like, should we just run them all in parallel to save time, unleash the swarm?

See, that is the parallel by default trap. And my team actually fell right into it initially. Really? Yeah.

During their integration phase, they had the designer and architect running simultaneously and the result, the designer hallucinated UI mockups for strictly back-end features. Oh no, so the implementer was just... The implementer was wasting valuable compute cycles trying to reconcile like pure back-end code with totally irrelevant front-end designs. So they were just burning tokens for nothing.

How did they fix that without slowing the whole pipeline down? They made the scheduler a lot smarter. Now it reads the first 200 words of the architect output to basically classify the feature immediately Oh smart Right. So if it detects a backend feature, it dynamically prunes the DAG It completely skips the designer agent.

Okay. But knowing what to build efficiently in those first two phases doesn't mean the code will actually survive reality Definitely not. Right. Because when that perfectly sequenced DAG outputs a feature, it hits production, the messy real world.

And that shifts the system into the high stakes autonomy phase. Which brings in the deployer, monitor, and iterator agents. And the stakes shift dramatically here. I mean, a bug in the foundation phase is an annoyance, right?

I know. But a bug here... It brings down production. Exactly.

And initially, they actually overinvested in the monitor's ability to just catch errors, but underinvested in the iterator's ability to decide what to do about them. It's kind of like having a triage medic. I mean, the iterator can't just passively watch the heart monitor's alarms going off, right? It has to evaluate the patient and, you know, actually decide on the treatment.

That is exactly where human in the loop becomes totally non-negotiable. Because logs and traces alone, they don't tell an AI what to do. The iterator has to evaluate the data and make a choice. Like deciding whether to roll back or push a hotfix.

Exactly. Roll back, patch, or flag it for a human decision. It's explicitly programmed to never silently sit on uncertainty. If it doesn't know the absolute safest path, it pages you.

I love that. So let's distill this into takeaways you can actually use with your engineering teams today. We need to look at structural fixes, right? Like keeping agent prompts tight Yeah.

The sweet spot is under a thousand tokens for instructions. Anything more than that. And the agent just loses focus. Makes sense.

And another crucial fix is forcing implementer agents to read the actual API interface files before they write a single line of code. Oh, to stop them from guessing. Yeah, it basically completely eliminates hallucinated API calls. So read first, write second.

But go back to speed for a second. If we force agents to wait in this rigid DAG structure, doesn't that inherently slow the pipeline down compared to just parallel execution? Well, you'd think so, but they actually saved 15% in wall clock time just by preloading tools. Like, really?

So while the architect is finishing its final sentence, the system is already warming up the compiler and fetching the API docs for the implementer in the background. It removes cold start latency entirely. Wow. So as an engineering leader, I guess the grand takeaway here is that AI architecture isn't just about what your agents do.

It's about exactly when they are allowed to start. Because optimizing the latency of an agent that isn't on the critical path is fundamentally wasted effort. Right, because the critical path dictates everything. Absolutely everything.

Which leads you with this to think about. As you look at your own pipelines today, what hidden bottlenecks, whether they're human or AI, are secretly determining your time to feature?


This episode is part of TechRevati Engineering, an AI-generated audio overview series. Per EU AI Act Article 50, we disclose AI involvement in every episode and on this page.