AI-generatedJune 13, 202624:00

AgentRevati: designing a zero-dependency pattern library

Why we shipped an agent pattern library with no runtime dependencies, the five patterns that survived three projects of use, the three we deleted, and the Zod-at-the-edge fix for model-output drift.

Transcript

Episode summary

Every multi-agent project we'd built was reinventing the same handful of patterns: critique loops, supervisor-worker, plan-then-execute, deterministic-fallback. Each project had a slightly different name, slightly different signature, slightly different error handling. We extracted the patterns into a library — then re-extracted them with a constraint we hadn't tried before: zero runtime dependencies.

What the constraint forced

Once we couldn't reach for a framework, we couldn't model patterns as classes-that-extend-a-base. We couldn't use a runtime config file. We couldn't have a global state container. Everything became a function: input goes in, output comes out, the function knows nothing about other functions in the library. That sounds restrictive — it turned out to be a sharper tool than the class-based design we replaced.

Five patterns that earned their keep

Critique loops. Producer emits an artefact; critic returns approve / structured concerns. The non-obvious choice: the critic must be a different prompt than the producer. Same prompt → same blind spots → rubber stamp.
Supervisor-worker. Type-safe dispatcher; supervisor declares worker signature, dispatcher refuses mismatched calls. Catches "supervisor returned a structure workers can't consume" at compile time.
Plan-then-execute. Two passes. Plan with the full task in view; execute one step at a time with only that step's context. Different contexts are optimal for the two passes.
Escalation. Worker realises mid-task it's over its head; returns structured "escalate" with reason + context. Explicit beats implicit failure every time — implicit shows up as a timeout with no information.
Deterministic fallback. Run a hand-written algorithm first; only call the model if the deterministic path returns "I don't know". Turns 95%-accurate into 99.5%-accurate for trivial cost.

Three patterns we removed

Generic state machines — beautiful diagrams, one real caller, every actual workflow had one branching point (an if).
Generic memory store — abstracted a layer without abstracting anything; every project needed either nothing or something more specific.
Auto-retry with backoff — hid model errors callers needed to see. Replaced with explicit "retry the last step with adjusted parameters" as a named Plan-then-execute step.

Where the abstraction broke down: model-output drift

Plan-then-execute assumes the planner produces output matching a declared schema. The model agrees on day one, drifts on day 30, and on day 60 emits structurally-similar-but-not-quite output. Strong typing across agent boundaries catches the drift — and produces a wall of brittle errors instead of useful behaviour.

Zod-at-the-edge is the pragmatic fix. Each agent's output passes through a schema parser at the boundary. Parse succeeds → next agent sees a typed value. Parse fails → a small repair loop re-prompts the producer with the validation error, asks for a corrected output, and tries again up to twice. Most drifts get fixed in one repair turn. We tried constrained decoding and finite-state grammars; both were slower or harder to debug.

The companion product page is at /en/products/agentrevati.

Full transcript

You know that incredibly specific friction when you're spinning up a brand new multi-agent project? Oh, absolutely. Yeah, you've got your blank canvas. You're ready to build something genuinely autonomous.

But, I mean, almost reflexively, you end up copy-pasting the exact same five boilerplate files from your last three repos. Right, the same clunky writing logic. Exactly. The same brittle retry wrappers.

The same sprawling prompt files. We're supposed to be operating on the absolute frontier of artificial intelligence, yet it constantly feels like we're stuck in this loop, just manually reinventing the same slightly wobbly orchestrator every single time. It's the universal tax we're all paying right now, honestly. The broader ecosystem hasn't really settled into reliable, battle-tested conventions yet, so every senior engineer essentially becomes a bespoke framework designer just to get a reliable workflow out the door.

Which is exactly why we're doing this deep dive today. We are looking directly at you, the senior engineers, the technical leads who are out there actually wiring these autonomous systems together in production. We got our hands on the internal engineering notes for an architecture project called AgentRevati. And this team, they basically hit a breaking point with that copy paste friction and they decided to extract their most successful battle tested patterns into a unified internal library.

But the fascinating part isn't that they built a library. I mean, everybody builds libraries. Right. It's the brutal, unyielding constraint they used to build it.

Yeah. They demanded absolute, strictly enforced, zero runtime dependencies. Zero. Which is wild.

It is an incredibly provocative stance to take in modern software engineering, particularly in the AI space. I mean, the default behavior right now is to just NPM install or PIC install some massive, sprawling framework the second you hit a roadblock. OK, let's unpack this because zero runtime dependencies in this context isn't just about, you know, keeping the bundle size small. No, not at all.

They're talking about zero framework lock in. No transitive dependencies that secretly pull in a massive opaque runtime environment under the hood. No proprietary YAML schemas acting as bloated configuration files. Their entire philosophy boils down to just using pure functions with strict types.

Data goes in, data comes out. Exactly. The authors note this originally started as a cultural preference on the team. You know the pain of bringing in a popular agent framework to solve one specific routing problem.

And suddenly... Suddenly that framework demands you adopt its entire worldview. Yes. It wants to manage your global state.

It forces you to inherit from its arbitrary base classes. It hijacks your logging. It's a nightmare. They wanted none of that.

But what began as a preference evolved into a rigorous design forcing function for the architecture itself. It sounds like, I don't know, bare metal programming for AI. It strips away all the magic. But it also sounds intensely restrictive.

I mean, it's like cooking without a microwave or prepackaged ingredients. It forces you to master the actual underlying technique, sure. But does forcing every piece of an agent architecture into an isolated pure function actually yield a more robust system? Or were they just making things infinitely harder to win an architectural purity debate?

That's the exact question they were asking themselves. But their production findings over three major projects suggest it actually resulted in a much sharper, more resilient system. Yeah. And the reasoning is deeply tied to how we hide bad design.

If a multi-agent architectural pattern doesn't compose cleanly as a simple pure function, it's almost guaranteed to be a terrible pattern when wrapped in a class too. Oh, that makes sense. Right. The object-oriented approach, or the heavy framework approach, just gives you a very convenient rug to sweep your messy state and tight coupling under.

The pure function constraint basically burned the rug. It made the architectural failure modes incredibly loud, isolated, and impossible to ignore. So the constraint was essentially a crucible. Exactly.

And out of all the complex diagrams and agent behaviors they tested across those production deployments, they found that only five architectural patterns actually survived the zero dependency test without needing constant modification. Just five. Just five. So let's look at the first survivor, the critique loop.

This is the foundational atomic unit of their multi-agent workflows. So you have a producer agent that generates an artifact, say, a complex SQL query or a technical summary. Okay. Then a critic agent evaluates it and either returns a Boolean approval or a highly structured array of specific concerns.

You loop that a predetermined number of times, and if they can't reach consensus, you abort or escalate. I read that and immediately thought, like, why spin up a completely separate agent function for this? If you're using a top-tier model, why not just append a system instruction that says, review your own code carefully before returning the final output? Or maybe use a higher temperature in the same session to get a fresh perspective.

Because the math of the context window simply doesn't allow for rigorous self-correction that way. The AgentRevati team highlighted a critical limitation here. Large language models are, at their core, incredibly advanced autoregressive token predictors. If you ask a model to evaluate the exact artifact it just generated within the same context window, the attention mechanism is already overwhelmingly weighted toward those specific tokens.

Meaning it's mathematically biased? Yes. The probability strongly biases the model to just rubber stamp its own work. To get genuine critique, you have to force architectural divergence.

Divergence meaning you completely isolate the execution environments. The producer's prompt is deeply focused on construction and fulfilling the user's intent. And the critic's prompt is entirely adversarial, looking solely for structural flaws. And crucially the critic doesn have the producer internal chain of thought polluting its context That isolation is the entire key You are instantiating a true red team And keeping it as a pure function means you can easily swap the underlying model for the critic maybe using a faster cheaper model specialized in code linting without rewriting any state logic That makes perfect sense.

OK, moving from a single generation loop to parallel fan out, we hit survivor number two, the supervisor worker pattern. Right. A supervisor agent takes a massive prompt, slices it into discrete subtasks, dispatches them to parallel worker agents and aggregates the results. We've all built some version of this.

And usually you just cross your fingers and pray the JSON parses correctly when the workers return. Oh, yeah. The JSON parse anxiety. So did they just wrap their fan-out logic in a massive nested try-catch block to handle the inevitable schema mismatch?

They went the absolute opposite route. And this is where the zero dependency constraint truly shines. They relied entirely on compile-time type safety. Wait, really?

Yeah. The dispatcher function in Asian Dravati uses advanced language-level generic to enforce alignment. It structurally refuses to even compile the code if the supervisor's declared output signature, the shape of the subtasks it claims it will generate, doesn't perfectly match the exact input parameters the downstream workers are typed to accept. So if my supervisor is supposed to pass a user ID string to the worker, but the prompt schema accidentally types it as an integer, the TypeScript compiler or Python type checker throws a fatal error before I even deploy the code.

That is the beauty of it. In a dynamic, stochastic AI environment, they manage to kill the most common source of fan-out crashes, schema misalignment between agents, before the application even spins up. No heavy runtime validation libraries are needed for the internal wiring. Just the native type system.

That is incredibly elegant. But fan-out is really only useful for highly parallel independent tasks. What about sequential complex reasoning? That brings us to pattern three.

Plan, then execute. Okay, break that down for us. They divide complex goals into two distinct functional passes. Pass one is the planner, which gets the massive overarching user context and generates a structured array of sequential steps.

Pass two is the executor function, which loops through and actually performs those steps. Here's where it gets really interesting, though. The crucial architectural mechanism here isn't just the separation of steps. It's the strict limitation of what the underlying model is allowed to see during each pass.

Yes. During the planning phase, the model has access to the full, rich context. The user's ultimate goal, the system constraints, the edge cases. But during the execution phase, the model is intentionally blinded to the broader context.

It only receives the specific, localized arguments required for that single current step. That feels deptly counterintuitive. Our instinct as engineers is always to give the LLM as much context as possible so it makes, you know, smarter decisions. But I love the analogy of a commercial kitchen for this.

The head chef is your planner. They need to know the entire menu, the food costs, and the fact that a food critic is sitting at table four. Right. They write the prep tickets.

Exactly. But the line cook, whose only job is to properly dice 50 onions, they don't need to know about the food critic at table four. If you give the line cook the overarching business strategy of the restaurant, they are going to get distracted, overthink their role, and mangle the onions. Context window contamination is the technical term for mangling the onions.

If the executing agent has the full overarching goal in its prompt, it frequently attempts to jump ahead. It will try to hallucinate solutions for step five while it's only supposed to be querying a database for step two. It's trying to be too helpful. Exactly.

By aggressively scoping the context to just the immediate task variables, AgentRevati drastically reduced execution drift. But scoping context, that tightly creates a new problem, right? What happens when that narrowly focused executing agent realizes it actually lacks the tools or the data to finish its specific step? That naturally leads us to pattern four, escalation.

In traditional agent wrappers, failure is usually implicit. An executing agent hits a wall, say an API is down or it needs a file it wasn't given, and it just spins its wheels hallucinating workarounds until it hits a hard timeout or returns a malformed string. It's a black box. The orchestrator just knows the task failed.

but the orchestrator has absolutely no programmatic idea why. Right. A timeout is essentially the agent ghosting the orchestrator. You're left digging through raw token logs trying to figure out what went wrong.

That ghosting is exactly what Agent Rivati's structured return types are designed to prevent. They force the worker agent to use a specific, explicit tool or return schema for escalation. The agent must return an object stating, I am escalating this subtask because it requires a capability I don't possess. or I am escalating because the database query returned to 404.

This transforms an opaque timeout into actionable data. The orchestrator can now programmatically route that specific failure to a more capable, expensive model or drop it into a human-in-the-loop review queue. It gives the AI a typed mechanism to raise its hand and say, I need an adult. But escalating to a larger model or a human is expensive and slow.

Which brings us to the final survivor, pattern 5. deterministic fallback. And the engineering notes point out a really fascinating cultural divide on their team regarding this specific pattern. What's fascinating here is how the differing experience levels reacted to AI reliability.

The junior engineers looked at this pattern, which explicitly mandates running a traditional handwritten juristic or rejects algorithm before invoking the LLM and pushback. What was their argument? They argued, why are we writing manual parsing code. The LLM usually works for this.

Oh boy. The senior engineers immediately recognized it. Usually works is the most dangerous phrase in production software. Because usually works means it fails 5% of the time, and that 5% is going to page the on-call engineer at 3 a.m.

on a holiday weekend That is the grim reality of production If you have an extraction task where a simple script can perfectly handle the straightforward 80 of cases you execute that pure function first You only invoke the LLM's non-deterministic reasoning if the traditional code returns an explicit, I don't know, state. Layering a basic heuristic in front of the model. You take a system that the LM gets right 95% of the time, and you easily push the overall pipeline reliability to 99.5% while simultaneously reducing your inference compute costs. It's bare metal pragmatism.

So those are the five that survived the zero dependency crucible. Critique loops, typed supervisor worker, scoped plan then execute, explicit escalation, and deterministic fallback. But a rigorous constraint is only valuable if it actively forces you to kill bad ideas. Very true.

Let's look at the cutting room floor. What beautifully designed abstractions did they fall in love with only to brutally delete? The first major casualty was state machines. I can practically hear the collective sigh of software architects everywhere.

We love a visually stunning state machine. Drawing multicolored workflow diagrams with transition edges pointing everywhere feels like real engineering. And that visual appeal was the exact trap they fell into. They built a robust, generic state machine engine to model complex agent workflows, handling transitions and state payloads.

It looked incredible on a whiteboard. But when they audited their actual usage across three large-scale projects, the ergonomics and the code base were disastrous. They realized every single workflow they were trying to model as a state machine actually only contained a single branching point. Wait, really?

So they architected an entire state transition engine just to avoid writing a basic Boolean if statement. That is the definition of premature abstraction. It had precisely one legitimate caller across tens of thousands of lines of code. The pure function constraint highlighted how absurd the overhead was, so they axed the entire engine.

Ruthless. Okay. Deleted abstraction number two, the generic memory store. Now, I have to assume this was designed to handle cross-session context.

Every single client wants their agents to have long-term memory so they don't have to start from scratch every conversation. Why would a unified memory wrapper fail the test? Because memory in AI applications is incredibly domain-specific. The team discovered their projects fell into two stark uncompromising categories.

Either the agent was entirely transactional and required zero cross-session memory, or the memory architecture was so deeply tied to the business logic that a generic wrapper actively hindered development. Give me a concrete example. Why wouldn't a basic key value abstraction work for both? Think about an autonomous ETL pipeline agent versus a customer service chatbot.

The ETL agent might need to store complex vectorized embeddings of intermediate JSON transformations to detect data drift over time. A customer service bot just needs a chronological array of raw string messages to maintain conversational flow. Very different shapes. A generic memory abstraction trying to accommodate both ends up being a bloated nightmare that satisfies neither.

It became a layer of abstraction that didn't actually abstract anything useful. Which brings us to the third deleted pattern, and this is the one that really challenges conventional wisdom. Auto-retry with exponential backoff. Now, I'll push back heavily here.

Exponential backoff is networking 101. If an API call fails, you wait 50 milliseconds, try again, wait 100 milliseconds, try again. Dropping that seems reckless. In traditional microservices, you are absolutely right.

But the fundamental difference is the nature of the failure. If an API is rate limiting you with a 429 error, an exponential backoff is the perfect solution. Sure. But if an LLM fails to output the required JSON schema, silently triggering a retry under the hood is a catastrophic design flaw.

It hides the semantic error from the orchestrator. Ah. If the LLM doesn't understand the prompt constraints, hitting the API three more times isn't going to fix its lack of understanding. No, it's not.

It's just going to silently burn tokens and latency. and the orchestrator loses the critical context of why the task is struggling. Precisely. Network failures and semantic failures require completely different handling.

By ripping out the generic auto-retry wrapper, they forced retry logic out of the shadows. They made it an explicit, observable step within the plan-then-execute pattern. So the planner now has to explicitly state, if step two fails due to a schema error, adjust the temperature parameter, and inject this clarifying instruction before retrying. Exactly.

So what does this all mean for the final library? We have this incredibly lean, functional, type-safe architecture. They've purged the generic memory wrappers, the state machines, and the hidden retries. It sounds bulletproof.

But the engineering notes detail a massive collision with production reality. Yeah, they hit a serious wall. And that brick wall is known as model output drift. This is the silent killer of autonomous systems.

You engineer a meticulous prompt instructing the model to output a deeply nested schema. On day one, the test passed beautifully. You deploy to production. But by day 30, the underlying model providers silently update their weights, or a minor tweak to the system prompt causes unintended butterfly effects.

And suddenly, the model starts returning data that is structurally similar but fundamentally broken. It returns a single comma-separated string instead of an array of strings. or it changes a camel case key to snake case. And this is where Agent Rivati's greatest strength, its rigorous reliance on strict compile time types and runtime boundary checks became a terrifying double-edged sword.

Why? When the model drifts even a fraction of an inch, the strict type validators at the agent boundaries instantly reject the payload. Instead of the system gracefully degrading or making a best guess, it throws a massive wall of brittle validation errors and halts execution entirely. Oh, wow.

So how did they solve it without violating their zero dependency constraint You can just email OpenAI and ask them to roll back their model weights because your specific pipeline broke I assume they had to write a massive custom parser to clean the data No they adopted a deeply pragmatic fix they call Zod at the Edge They permitted a single external dependency Zod, the schema validation library, but only at the absolute perimeter of the application. Okay, keeping the core clean. Right. The core logic remains dependency-free, but every single payload emitted by an agent must pass through a strict Zod parser before it enters the orchestrator.

Okay, but applying Zod validation just throws the exact same schema error. How does that fix the drift? They don't just throw the error and crash. They utilize a micro-repair loop.

They capture the exact programmatic Zod error string, which details exactly which key failed and why, and they feed that raw error string directly back into the prompt of the producing agent with a simple instruction. Your output failed validation. Here is the exact schema error. Fix it.

And how many times does it try? They allowed this fast loop to run up to two times. It's exactly like a bouncer at a club. The bouncer is your Zod validation at the edge.

You hand them your ID. If the birth date is smudged, the bouncer doesn't instantly ban you for life and throw you into the street. They point directly to the smudge, hand the ID back and say, wipe that off and step back in line. You get a targeted chance to fix the exact formatting issue before you are rejected.

That is a brilliant way to visualize the mechanism. And the empirical results were staggering. They found that the vast majority of these subtle stochastic model drifts are successfully corrected by the LLM in just a single repair turn. Because it just needed to see the mistake.

Yes. As soon as the model sees the explicit validation error, it immediately recognizes its formatting mistake and outputs the perfect schema. Why not use something closer to the metal, though, like constrained decoding or forcing a finite state grammar at the inference level? That guarantees perfect JSON every single time without needing a repair loop.

They evaluated those techniques, but they violently violated the core design constraint. Constrained decoding requires deep, heavy integration with the specific inference engine you are using. It ties you to specific runtimes and often incurs massive latency penalties. Ah, I see.

The Zod at the Edge repair loop isn't mathematically elegant, but it's purely functional. It's fast, it works across any model provider via standard API calls, and it's incredibly easy to trace in the locks. So looking back at this entire journey, the crucible of zero dependencies, the 2,000 lines of functional code they eventually settled on, the authors reflect on what they would do differently if they were starting from a blank IDE today. Their biggest realization was that they could have been even more ruthless.

They noted they would cut the library down to roughly 1,500 lines by deleting the most fundamental abstraction of all, the generic agent-based type. Wait, they built an internal agent architecture library, and their primary regret is not deleting the word agent from the core types. It sounds funny, but when you look at how the code was actually consumed, it makes complete sense. They audited the production deployments and realized that absolutely zero developers were instantiating a generic agent.

Oh, because they needed specific roles? Every single implementation required a highly specific role-typed function. They were calling create critic agent or create worker agent because the execution contexts were so vastly different. The generic base type was a phantom abstraction.

As they put it, it was load-bearing for nothing. The other major takeaway was regarding the repair loops. They wouldn't wait to discover that Zod at the edge pattern in production after everything broke. They would ship it as a foundational, heavily documented requirement from day one.

Dealing with the stochastic failure of language models isn't an edge case you patch later. It is the core architectural requirement of the entire system. Exactly. So for the senior engineers listening right now, staring at that blank repository and debating whether to pull in a massive framework, here is the playbook from the trenches.

The playbook. Constrain your design relentlessly. If a pattern doesn't compose as a pure function, strongly reconsider building it. Validate strictly at the edges.

Give your agents typed explicit tools to escalate when they are stuck. And please do not be afraid to highlight that sprawling, beautiful state machine abstraction and hit the delete key when a simple if statement will do. But, you know, stepping back from the code for a moment, this entire deep dive raises a much larger, slightly uncomfortable question about where this industry is heading. Where is that?

We've spent this time breaking down these brilliant architectural patterns, adversarial critique loops, context scope planners, edge repair loops, all meticulously designed by human engineers to wrangle the sheer chaos of language models. Right. We're essentially building massive structural scaffolding just to keep the AI on track. Yes, but if standard software design patterns, pure functions, compile time types, and simple deterministic fallbacks are the only things successfully taming these models today, what happens in the near future when we get models capable of writing their own orchestration logic?

When an autonomous system is tasked with building its own multi-agent architecture, will it inherently discover the value of pure functions and strict boundaries? Or will it invent dynamic, constantly shifting architectures that are so deeply entangled and non-deterministic that human engineers can't even read the code anymore? That is the thought that keeps architects awake at night. If our most robust systems still require handwritten deterministic fallbacks to jump from 95% to 99.5% reliability, we have to ask ourselves, are we actually building intelligent agents or are we just using LLMs as incredibly expensive fuzzy routing layers for our traditional code?

It's a great question. So the next time you find yourself blindly copy pasting those same five boilerplate files, take a second. Look closely at the abstractions you're carrying with you. Ask yourself if you are actually solving the problem at hand or if you're just building another beautiful trap.

This episode is part of TechRevati Engineering, an AI-generated audio overview series. Per EU AI Act Article 50, we disclose AI involvement in every episode and on this page.