← Back to all episodes
AI-generatedMay 15, 202620:41

Inside EinCoreRAG: Multi-tenant Architecture

How EinCoreRAG isolates tenant data across PostgreSQL Row-Level Security and per-tenant Qdrant collections to keep multi-tenant RAG fail-closed under real-world failure modes.

Transcript

Episode summary

When we pitch EinCoreRAG to a CISO, the first question is always the same: "How do you guarantee my data won't leak to another tenant?" Most RAG platforms answer at the application layer — a WHERE tenant_id = ? clause sprinkled into queries. We chose not to. Application-layer authorisation is brittle: one missing predicate, one broken middleware, one ORM that silently drops a filter, and tenant boundaries dissolve. We pushed isolation down a level, into the storage engines themselves.

Two layers, two engines

EinCoreRAG stores data in two engines: PostgreSQL for relational and metadata, Qdrant for vector embeddings. We isolate tenants at each layer independently, so a flaw at one layer cannot reach the other's data.

Relational layer — PostgreSQL Row-Level Security. Every transaction starts with the API setting a per-request session variable (SET LOCAL app.current_tenant_id = '...'). RLS policies on the relevant tables compare each row's tenant_id against that variable. A query that forgets its WHERE tenant_id predicate returns only the current tenant's rows — RLS is a backstop for the predicates you didn't write carefully enough. The current_setting('...', true) missing-OK flag means a completely-absent session variable evaluates to NULL and the policy returns zero rows: loud break, zero leak.

Vector layer — per-tenant Qdrant collections. Qdrant lets you put everyone in one big collection with a tenant_id payload filter. We tried it. We moved off it. Three reasons: true isolation (boundary enforced by the engine, not by the query author), HNSW performance (smaller graphs find neighbours in fewer hops), and data residency (a regulated tenant can have their collection routed to a specific node or region without forcing every other tenant to live there too).

What "fail-closed" actually means

Most security writing talks about fail-closed without defining the closed thing. Concretely: if our API layer forgets to set the session variable, the RLS policy compares tenant_id against NULL and returns zero rows. If a query against Qdrant uses the wrong collection name, Qdrant returns 404. If both layers fail simultaneously, the LLM has nothing to retrieve — it cannot fabricate cross-tenant context because there's no cross-tenant context in scope. Every failure mode produces a visible break instead of a silent leak.

What we'd warn a team starting now

  • Don't put RLS off until "we add it later" — retrofitting RLS to a busy application means walking every query and every migration.
  • Don't over-trust single-engine isolation — two layers let one fail without breaching the boundary.
  • Don't treat "tenant_id payload filter" as isolation — it's filtering, not isolation. Different threat model.

The companion blog post, Multi-tenant RAG: lessons from 17 phases of EinCoreRAG, goes deeper on the SQL; the audio overview above covers the why and the architectural tradeoffs.

Full transcript

Right now, your application code is probably the only thing standing between your biggest enterprise client and, well, a catastrophic cross-tenant data leak. Yeah, and according to the architecture notes we are diving into today, relying on that code is just a massive mistake. I mean, think about the stakes for a moment, right? Especially if you are a senior back-end engineer or a CISO tuning in, you're demoing a new feature on your company's RAG platform, your Retrieval Augmented Generation app.

Right. You type a prompt into the LLM, you hit enter, and the response includes highly sensitive proprietary financial data. Oh. But the kicker is it belongs to your client's biggest competitor who, you know, also happens to be a tenant on your exact same platform.

That is just, I mean, that is the ultimate failure state for a multi-tenant system. You haven't just exposed data, you've actively fed a competitor's proprietary information directly into an LLM prompt, allowing the model to synthesize it, summarize it, and present it as part of the client's own workflow. Your SOC2 security compliance, the whole framework that proves to your clients you handle their data responsibly, is instantly compromised. Okay, let's unpack this.

Yeah. Because if you are listening to this, you are the one responsible for making sure these systems are mathematically airtight. Exactly. And this deep dive is going to arm you with the exact architectural patterns you need to pass those grueling CISO audits.

We are looking at the internal architecture of a platform called EinCoreRAG. Specifically, we're looking at how they achieved true fail-closed multi-tenant isolation by combining two highly specific engines. Right. PostgreSQL row-level security and per-tenant Qdrant vector collection.

Yeah. And the notes reveal a brilliantly paranoid piece of engineering. To understand why they built it this way, we first have to look at how almost everyone else handles this and why it is, frankly, fundamentally flawed. When an enterprise client asks how a platform guarantees their data won't leak to another tenant, the standard engineering response is to point to the application layer.

Like, they point to their middleware. Which usually means relying on an ORM and an object relational mapper to just, you know, sprinkle a where tenant_id equals clause into every single database query. Yeah, exactly. The logic seems sound on paper, right?

The user logs in, the system generates a JSON web token for their session, the application middleware intercepts their incoming request, parses that JWT to find their specific tenant ID, and then forces the ORM to append that idea to the database lookup. That is the textbook approach, and it is incredibly brittle. Really? Yeah, because it relies on flawless execution across a complex chain of events.

Let's walk through the mechanics of how that breaks down. Okay. Suppose a library update introduces a really subtle bug in your middleware. A malformed JWT signature no longer throws a hard error, it just logs a warning and proceeds with a null tenant ID.

Oh, I see where this is going. Right. The request moves to the ORM. The ORM seeing a null value might actually be configured to drop the where predicate entirely in certain complex SQL joins.

Suddenly, your query isn't filtering by tenant anymore. It is pulling everything. It's like building a highly advanced deep sea submarine, but for the interior security, you just install screen doors. Yes, that's exactly it.

You're relying entirely on the outer hull, your middleware to never crack. And you were just hoping every single employee perfectly remembers to slide the screen door shut every time they enter a room. A single distraction, a single missing SQL predicate in a thousand line code base and the tenant boundary completely dissolves. That submarine metaphor perfectly captures the vulnerability.

Application layer authorization requires every future developer who touches the code base to understand and perfectly implement the security model every single time. Right. It is a statistical inevitability that someone eventually will write a raw SQL query or configure an ORM mapping that drops the ball. But wait, if you have a solid engineering culture, rigorous code reviews, automated integration testing, and, you know, static analysis tools scanning every pull request.

Yeah. Isn't pushing security all the way down into the database engine itself adding a massive amount of unnecessary operational complexity? I mean, a good CICD pipeline should catch a missing where clause before it ever reaches production. Well, this raises an important question.

What is your ultimate threat model? EinCoreRAG's architecture acknowledges a fundamental truth of software engineering, which is that developer tooling is not infallible. Tests have blind spots. That's fair.

If a single line of bad application code bypassing your test suite can cause a catastrophic data breach, your architectural foundation is flawed. Isolation must be pushed down into the storage engines themselves. So you basically bake the constraints into the physics of the database so it literally cannot be bypassed by the application layer, no matter how sloppy the code gets. So the first line of defense they implemented is at the relational and metadata layer.

They moved the security boundary directly into PostgreSQL itself using row security or RLS. Right This isn't just an application making a request This is the database actively interrogating every single query Yeah, PostgreSQL RLS acts as this invisible bouncer at the table level. Here is the mechanism EinCoreRAG uses. Okay.

They configured their PostgreSQL database so that an application cannot simply connect and run a select statement. Oh, interesting. Every single transaction must begin with the API explicitly setting a per request session variable in the database environment. Ah, so the API executes a SET LOCAL command in SQL, passing the tenant ID from the user's request.

Yes. Then the database has an RLS policy physically attached to the tables. That policy intercepts every read, write, and delete and automatically compares the tenant_id column of every row against that session variable. Exactly.

And the choice of set local rather than a standard set command is a really vital engineering detail. Why is that? Because in a modern backend, you are almost certainly using a connection pooler like pgBouncer. You don't open a new heavy database connection for every single user.

You multiplex thousands of client requests across a small pool of persistent database connections. Oh, right. So if you used a global set command, client A's tenant ID would pollute that shared connection.

Right. The next split second, client B's request comes in, gets assigned the same recycled connection from the pool, and suddenly they're reading client A's data. Precisely. SET LOCAL strictly scopes the variable to the current transaction.

Yeah. The moment the transaction commits or rolls back, Postgres aggressively clears the variable. Making it perfectly safe for connection pooling because it leaves the connection pristine for the next request. You got it.

They also detail a fascinating implementation trick here in the notes. When the RLS policy reads that variable, it uses the Postgres function current_setting to look up app.current tenant_id. Yeah. But they explicitly pass in true as the second argument.

That true acts as a missing OK flag. You're actually weaponizing the NULL value to trigger a safe failure. Yes. That missing OK flag is the linchpin of their fail closed strategy.

If the application layer is completely compromised and forgets to send the SET LOCAL command entirely. The database doesn't crash the query with an undefined variable error. Exactly. It evaluates the session variable as NULL.

And in SQL, comparing a row's tenant_id to NULL always evaluates to false. Brilliant. But there is a massive trap here that they warn about, involving how you write custom database functions. The whole SECURITY INVOKER versus SECURITY DEFINER issue.

Oh, yeah. This is where teams trying to implement RLS often inadvertently build a backdoor into their own system. How so? Well, in Postgres, you might write a custom function, perhaps a complex aggregation, to calculate a client's billing metrics across multiple RLS protected tables.

Okay. When you create that function, Postgres needs to know whose execution context to run it under. Right. And if you create the function with SECURITY DEFINER, it executes with the privileges of the user who created the function, which is almost always a database superuser or an administrative role.

And superusers bypass row-level security by default. So if a developer writes a custom function as a SECURITY DEFINER, any application user who calls that function is suddenly operating with elevated privileges for the duration of that query. Completely ignoring the RLS policies you just spent weeks building. Yes.

Okay, so securing the relational metadata is great, but in a RAG system, the actual knowledge like the rich document contents, the semantic meaning, lives in vector embeddings. Right. And that brings us to the second engine. For the vector layer, EinCoreRAG uses Qdrant.

Securing a vector database requires an entirely different mental model than securing relational data. The prevailing industry standard for vector databases right now is to throw every tenant's vectors into one massive global collection. Yeah, and to separate them, teams just attach a tenant_id to the metadata payload of each vector. Exactly.

When the application does a similarity search, it passes the user's prompt and tells the vector database, find me the nearest neighbors to this prompt, but only return vectors where the payload matches this specific tenant ID. Right. But what's fascinating here is that EinCoreRAG explicitly rejected that standard model. They completely abandoned payload filtering for multi-tenant isolation.

Instead, they provisioned a completely separate, physically distinct Qdrant collection for every single tenant. And they isolated three distinct reasons for this architectural pivot. The first is true isolation. Yeah, because a payload filter in Qdrant is conceptually identical to a where clause in an ORM.

It's just an application layer filter applied at the last second. Right. If a developer misconfigures the search query and drops the payload filter from the JSON request, the vector engine will happily search across the entire global collection It return the nearest neighbors regardless of who owns them. By putting each tenant in their own collection, the boundary is enforced by Qdrant routing engine at the network layer, not by a developer remembering to append a filter to a payload. Here is where it gets really interesting. Their second reason isn't about security at all.

It's about raw performance and the underlying physics of vector search. Oh, the HNSW graph. Yeah. Most vector databases, including Qdrant, use an algorithm called HNSW — Hierarchical Navigable Small World.

It is essentially a multi-layered graph index. And to traverse that graph and find nearest neighbors quickly, the algorithm relies on a specific hop budget. Right. It evaluates a node, calculates the distance to the target vector, and hops to the next closest node.

So think of it like trying to find a specific friend in a crowded venue. Yeah. Doing a payload filter on a massive global collection is like wandering into a massive packed football stadium. That's a great way to put it.

You have to ask every single person you bump into if they know your friend and then check their ticket to see if they even belong to the right tenant. Because the graph is polluted with millions of vectors from other tenants, you burn through your hop budget just navigating past irrelevant nodes to find the right neighborhood. That stadium analogy visualizes the computational waste perfectly. A single huge graph forces the algorithm to evaluate and discard an enormous amount of irrelevant data.

But by giving each tenant their own separate, smaller graph, their own VIP room, the HNSW algorithm finds the nearest neighbors exponentially faster. Because every single hop is meaningful since every node in the graph belongs to the correct tenant. Exactly. And there is a third reason for this pivot, data residency.

Oh, this is huge for enterprise clients. Right. If you have all your tenants in one giant global Qdrant collection, that data has to physically live on a specific server cluster. Let's say your infrastructure is in the U.S.

Tomorrow, your sales team lands a massive enterprise client in the European Union who is heavily regulated and legally requires their data to remain within the EU. With a single global collection, your options are terrible. You either lose the multi-million dollar contract or you undergo a massive migration to move your entire global collection containing all your U.S. clients' data over to an EU data center just to satisfy that one new tenant's compliance requirements.

But with per-tenant collections, routing is completely decoupled. Tenant A's collection lives on a U.S. cluster. Tenant B's collection is explicitly routed to an EU cluster.

You achieve granular per-tenant data residency without disrupting the rest of your user base. But, I mean, 10,000 tenants means 10,000 separate vector collections. That sounds like an absolute operational nightmare for a DevOps team. It really does.

You have to manage the infrastructure as code, handle the lifecycle of the collections, enforce strict naming conventions. Isn't the operational overhead crushing? The operational friction is significant, and they don't hide that fact in the notes. You absolutely cannot provision these manually.

It requires heavy investment in control plane automation. Right. But EinCoreRAG views this as a highly asymmetric tradeoff. The engineering overhead of automating collection management is finite and predictable.

While the existential risk of a cross-tenant data leak fed into an LLM is company ending. Exactly. They willingly absorbed the DevOps complexity to completely eliminate the security fragility. This philosophy of preferring loud breakage over silent failure is what they mean when they use the term fail closed.

We throw that term around a lot in cybersecurity. But security documentation rarely defines what the closed state physically looks like in production. Well, the CISO sitting across the table doesn't just want to know how you secure the data when everything is working. Yeah.

They want to know the blast radius when things inevitably break. So let's walk through the blast radius of the three specific failure modes they outlined. First, the PostgreSQL engine. We talked about that API session variable.

What happens if the middleware completely crashes, bypasses the token validation, and the API forgets to set the app.current_tenant_id session variable entirely? Because of that true missing OK flag in the database setting, the RLS policy compares the rows tenant_id against NULL. As we established, nothing equals NULL in SQL. Right.

The database evaluates the security policy as false for every single row in the table. The query executes, but it returns zero rows. So the application layer panics and throws 500 level server errors because it received no data, but the actual data leak is exactly zero. Loud error, zero leaks.

Okay, scenario two. The vector layer. What happens if a malicious user manipulates an endpoint or the routing logic is busted and the query hits Qdrant with the wrong collection name? Qdrant does not attempt to gracefully degrade or search a default collection.

If it receives a query for a collection that doesn't exist or that the current API key doesn't have explicit access to, it immediately returns a network level 404 not found error. Oh, wow. So once again, the transaction is violently halted before any semantic data can be retrieved. Precisely.

And then we reach the doomsday scenario. What if both the relational layer and the vector layer fail simultaneously? The API logic is completely shredded. A malicious prompt has bypassed every application guardrail and it hits the storage layer. If we connect this to the bigger picture, the ultimate consumer of this pipeline is the large language model. The standard RAG prompt template essentially says, given the following context, answer the user's query. If both the Postgres and Qdrant storage engines fail closed and return zero rows, the context window injected into the LLM prompt is completely blank.

It's mathematically impossible for the LLM to leak a competitor's data because it never received it. Right. The prompt goes to the LLM empty. The AI might respond with, you know, I don't have enough information in my context to answer that, but it physically cannot hallucinate cross-tenant context because no cross-tenant context exists in its working memory.

The overarching goal of a fail-closed architecture is creating highly visible breaks. If an application layer payload filter fails, the failure is silent. Yeah, the app keeps running smoothly, the LLM keeps generating answers using the leaked data, and you are completely blind to the breach until a furious client reports it. But if a storage layer isolation mechanism fails, the system halts.

Pager-duty alarms go off, the application throws raw errors, and operations teams see the break immediately. Having dissected the mechanics of this architecture, let's look at the hard-won lessons EinCoreRAG included for engineering teams who are starting to build RAG pipelines right now. Their first warning is about timing. Do not wait to add row-level security to your backlog for later.

Oh, retrofitting RLS onto a mature, busy application is a monumental engineering headache. Think about the blast radius within your own code base. It isn't just your API endpoints that need updating. Exactly.

Every single background cron job, every legacy data migration script, every internal admin dashboard suddenly needs to be refactored to inject the correct session variables or the database will lock them out. The friction is immense. You have to design the schema with RLS enabled from day one. For sure.

Lesson number two, don't overtrust a single engine isolation strategy. We spent the first half of this deep dive praising Postgres RLS. But if that was their only defense, a bizarre zero-day exploit in the Postgres engine could still expose them. Having two completely independent layers, a relational engine and a vector engine, provides true defense in depth.

A catastrophic failure in one engine doesn't automatically breach the entire tenant boundary. Right. And their final warning addresses the most common misconception in AI engineering right now. Do not confuse a payload filter with true isolation.

A payload filter is an application-side convenience designed to narrow down search results for relevance. It is not designed to act as a cryptographic boundary between hostile tenants. Treating a payload filter as a security mechanism represents a fundamental misunderstanding of threat models. Totally.

So what does this all mean for you? If you are sitting in that engineering or security role, the core takeaway is a fundamental shift in how you view trust within your system. Yep. It's about moving away from hoping your middleware is perfectly written and moving toward mathematically proving that your storage engines will simply refuse to share secrets.

You are removing the burden of perfect execution from your developers and pushing the responsibility down to the immutable laws of your infrastructure. By inextricably linking Postgres row-level security with physically separated Qdrant collections, EinCoreRAG established a multi-layered defense that actively anticipates application failure. It guarantees that even when the entire application logic collapses, the underlying tenant data remains structurally isolated. Implementing these two patterns, session scoped RLS and physical vector separation, is the difference between sweating bullets during a security audit, playing they don't find a missing where clause.

And confidently showing a CISO the exact physical mechanisms that keep their data locked down. It replaces anxiety with architectural certainty. It really does. But before we wrap up this deep dive, I want to leave you with a final thought to mull over.

We spent this entire time exploring how to rigidly enforce these storage level boundaries today. Right. But the AI landscape is shifting under our feet. Think about where autonomous agents are heading.

Oh, yeah. That's a whole different ballgame. If rigid, hard-coded isolation is this critical for static RAG applications right now, how are we going to enforce these fail-closed boundaries tomorrow when AI agents start autonomously creating, migrating, and querying their own temporary vector spaces on the fly? When the AI itself is dynamically spinning up the database infrastructure to solve a complex, multi-step problem, how do you mathematically guarantee it doesn't leave the screen door open?

That is a wildly complex horizon we are moving toward, and the current static models definitely won't hold up. Because the last thing you want is to be sitting across from that enterprise client, realizing your highly advanced autonomous agent just handed over the keys to the wrong kingdom. Definitely something to keep you up at night. Until next time, keep your queries scoped and your collections isolated.


This episode is part of TechRevati Engineering, an AI-generated audio overview series. Per EU AI Act Article 50, we disclose AI involvement in every episode and on this page.