Multi-Signal Vault Search — retrieval that goes beyond a vector database

A vector database is where retrieval starts. This is what enough actually looks like.

Most teams building AI on their documents stop at a vector database. We build a retrieval architecture that searches over them with four complementary signals at the same time — hybrid retrieval, query rewriting, diversity reranking, and confidence tagging — so the agent stops being confidently wrong.

Technology showcase Agntic Consulting Retrieval architecture

Any team building AI on top of their documents starts the same way: pick an embedding model, drop their files into a vector database, and watch the demo work. Then real users show up. They paste in an error code and get back something tangentially related. They ask the same question two different ways and get two different answers. The model speaks with full confidence about a document that doesn't actually exist. The vector database isn't wrong — it's just incomplete.

01 · The ceilingWhy a vector database alone keeps falling short

Dense embeddings are extraordinary at meaning. They are mediocre at strings. Ask a vector index about "ERR_2847" or invoice number "INV-2024-08831" or the citation "§4.2(c)" and it will happily return passages about error handling in general. The exact match was right there in the corpus; the geometry just smoothed it out. Keyword search has the opposite problem: it nails the literal token and misses the same idea phrased differently. Most production systems pick one and pay the cost of the other. On top of that, there's no answer to the question every business actually cares about: how sure is this?

Vector-only retrieval

"What does ERR_2847 mean?"

Returns "Common error handling patterns", "Debugging guide overview", "Logging best practices." The exact string is in the docs. None of the top hits contain it.

Multi-signal retrieval

"What does ERR_2847 mean?"

BM25 lane snaps to the literal token; the dense lane confirms semantic neighborhood; RRF fuses the two; the exact-match runbook entry surfaces at position one, tagged HIGH.

02 · The four signalsFour ways of asking, fused into one answer

The architecture isn't an ensemble of models stacked on each other. It's four orthogonal questions asked of the same vault in parallel, then merged with techniques that already have years of literature behind them. The point is the combination — each signal catches a class of failure the others can't.

Signal 01Hybrid retrieval — meaning and keyword

Two retrievers run side by side: a dense-vector search over embeddings and a BM25 keyword search over the raw text. Their result lists are fused using Reciprocal Rank Fusion with a constant k=60 — a drop-in from the IR literature that doesn't need tuning per corpus. Vectors handle paraphrase; BM25 handles names, error codes, citation numbers, and any string a customer actually typed. Neither covers both.

Signal 02RAG-Fusion — ask the question three different ways

Before retrieval runs, an LLM rewrites the user's question into three additional variants — different vocabulary, different framing, same intent. Hybrid retrieval runs on all four queries (the original plus three rewrites), and the four result sets are fused together. The win: documents written in vocabulary the user didn't use still surface. Someone asks about "firing a vendor"; the docs say "termination for cause"; the rewrite bridges the gap.

Signal 03MMR reranking — the top five shouldn't repeat themselves

A common failure of fused retrieval is that the top hits are all close paraphrases of the same paragraph from the same file. The model gets the same evidence five times and confidently calls it consensus. Maximal Marginal Relevance rebalances the final list to trade a little relevance for diversity, so the answer is grounded in multiple distinct passages instead of one document quoted back to itself.

Signal 04Self-RAG — how sure is this answer?

Every result that survives the funnel is tagged with a confidence label — HIGH, MEDIUM, or LOW — derived from the fused score, source agreement, and grounding strength. The agent uses the tag to decide what happens next: answer directly from the vault, fall back to a live web search, or ask the user a clarifying question. This is the signal that prevents the confidently-wrong answer. Without it, every result looks equally trustworthy.

03 · The vault itselfIncremental ingestion, no batch re-runs

A retrieval architecture is only as good as the index underneath it, and in production the index is never finished. Documents arrive, get edited, get deleted. We don't re-embed the corpus on every run — that's wasteful and slow. Ingestion is manifest-driven: a record at ./memory/vectors/ingestion-manifest.json tracks the hash and timestamp of every file we've already processed. Each run diffs the filesystem against the manifest and only touches NEW, CHANGED, and DELETED entries. A file watcher rides alongside it and re-indexes documents the moment they change on disk — no batch jobs, no overnight rebuilds, no stale results sitting in the vault while someone waits for the next sync.

04 · Why we built it this wayWhat "enough" looks like in production

Every team building AI on top of their own documents discovers the same thing in roughly the same order. The vector demo is exhilarating, the first real users are humbling, and somewhere around month three the backlog fills up with the same handful of complaints: it can't find things by name, it gives me the same paragraph five times, it sounds sure about things it doesn't know. We've built this architecture for clients enough times that we now ship it as the default. Hybrid retrieval, RAG-Fusion rewriting, MMR diversity, Self-RAG confidence, manifest-driven ingestion — none of these are novel on their own. The work, and the value, is in stacking them so they cover each other's gaps and ship as one coherent system.

That's what "enough" looks like in production. Not a vector database with a chat wrapper — a retrieval architecture that knows what it knows, knows what it doesn't, and keeps the vault honest as the documents underneath change.

A vector database is where retrieval starts. This is what enough actually looks like.