Our work · Multi-Signal Vault Search

A vector database is where retrieval starts. This is what enough actually looks like.

Most teams building AI on their documents stop at a vector database. We build a retrieval architecture that searches over them with four complementary signals at the same time — hybrid retrieval, query rewriting, diversity reranking, and confidence tagging — so the agent stops being confidently wrong.

Technology showcase Agntic Consulting Retrieval architecture
"What's our policy on vendor renegotiation past 90 days?"
Signal 01
Hybrid retrieval
Dense vectors and BM25 keyword search, fused via Reciprocal Rank Fusion at k=60.
Signal 02
RAG-Fusion rewriting
Three LLM-generated query variants run alongside the original — 4 fan-outs per turn.
Signal 03
MMR reranking
Top hits rebalanced for diversity — no five near-duplicate paragraphs at the top of the list.
Signal 04
Self-RAG confidence
Every result tagged HIGH / MEDIUM / LOW so the agent knows what to do next.
Ranked · confidence-tagged 4 signals fused → 1 answer
01 Vendor MSA §4.2 — Renegotiation triggerscontracts/vendors/2024-msa-template.md · ¶12 0.91 High
02 Procurement playbook — 90-day check-inplaybooks/procurement.md · ¶31 0.84 High
03 Finance memo — vendor terms exception processmemos/2024-q3-finance.md · ¶7 0.71 Med
04 Legal brief — auto-renewal jurisprudencelegal/auto-renewal-summary.md · ¶3 0.58 Low

One query, four lanes, one ranked answer. Each signal catches something the others miss — and the confidence tag is what tells the agent whether it can speak from the vault or needs to look elsewhere.

Any team building AI on top of their documents starts the same way: pick an embedding model, drop their files into a vector database, and watch the demo work. Then real users show up. They paste in an error code and get back something tangentially related. They ask the same question two different ways and get two different answers. The model speaks with full confidence about a document that doesn't actually exist. The vector database isn't wrong — it's just incomplete.

01 · The ceilingWhy a vector database alone keeps falling short

Dense embeddings are extraordinary at meaning. They are mediocre at strings. Ask a vector index about "ERR_2847" or invoice number "INV-2024-08831" or the citation "§4.2(c)" and it will happily return passages about error handling in general. The exact match was right there in the corpus; the geometry just smoothed it out. Keyword search has the opposite problem: it nails the literal token and misses the same idea phrased differently. Most production systems pick one and pay the cost of the other. On top of that, there's no answer to the question every business actually cares about: how sure is this?

Vector-only retrieval
"What does ERR_2847 mean?"
Returns "Common error handling patterns", "Debugging guide overview", "Logging best practices." The exact string is in the docs. None of the top hits contain it.
Multi-signal retrieval
"What does ERR_2847 mean?"
BM25 lane snaps to the literal token; the dense lane confirms semantic neighborhood; RRF fuses the two; the exact-match runbook entry surfaces at position one, tagged HIGH.

02 · The four signalsFour ways of asking, fused into one answer

The architecture isn't an ensemble of models stacked on each other. It's four orthogonal questions asked of the same vault in parallel, then merged with techniques that already have years of literature behind them. The point is the combination — each signal catches a class of failure the others can't.

Signal 01Hybrid retrieval — meaning and keyword

Two retrievers run side by side: a dense-vector search over embeddings and a BM25 keyword search over the raw text. Their result lists are fused using Reciprocal Rank Fusion with a constant k=60 — a drop-in from the IR literature that doesn't need tuning per corpus. Vectors handle paraphrase; BM25 handles names, error codes, citation numbers, and any string a customer actually typed. Neither covers both.

Lane A · Dense vectors
Semantic neighbors
Cosine-similar passages by meaning.
01Procurement · vendor terms0.82
02Negotiation tactics overview0.74
03Quarterly contract review0.69
Fused with RRF k = 60
Lane B · BM25 keyword
Literal-token hits
Exact strings, codes, and citations.
01MSA §4.2 "renegotiation"14.2
02"90-day" clause11.9
03Vendor agreement appendix9.6

Reciprocal Rank Fusion (k=60). Both lists vote; rank position matters more than raw score. The exact-match clause that BM25 found at rank 1 outranks the soft semantic hit a vector-only system would have crowned.

Signal 02RAG-Fusion — ask the question three different ways

Before retrieval runs, an LLM rewrites the user's question into three additional variants — different vocabulary, different framing, same intent. Hybrid retrieval runs on all four queries (the original plus three rewrites), and the four result sets are fused together. The win: documents written in vocabulary the user didn't use still surface. Someone asks about "firing a vendor"; the docs say "termination for cause"; the rewrite bridges the gap.

User query How do we get out of a vendor contract early?
v1What are the termination clauses in our vendor agreements?
v2Conditions under which a vendor MSA can be cancelled before term end.
v3Early exit, breach, and termination-for-cause provisions for suppliers.
Per-variant hybrid retrieval → fused
Original · 12 hits
v1 · 12 hits
v2 · 12 hits
v3 · 12 hits
Fused ranked list 14 unique, vocabulary-bridged

One question, four fan-outs. Catches documents whose vocabulary doesn't overlap with how the user phrased their question — the failure mode no amount of embedding tuning can fix.

Signal 03MMR reranking — the top five shouldn't repeat themselves

A common failure of fused retrieval is that the top hits are all close paraphrases of the same paragraph from the same file. The model gets the same evidence five times and confidently calls it consensus. Maximal Marginal Relevance rebalances the final list to trade a little relevance for diversity, so the answer is grounded in multiple distinct passages instead of one document quoted back to itself.

Pre-MMR · raw fused listRedundant
01MSA §4.2 — renegotiation triggersSource A
02MSA §4.2 — renegotiation triggers (¶b)Source A
03MSA §4.2 — renegotiation triggers (¶c)Source A
04MSA §4.2 — renegotiation triggers (¶d)Source A
05Procurement playbook — check-insSource B
Post-MMR · rebalancedDiverse
01MSA §4.2 — renegotiation triggersSource A
02Procurement playbook — check-insSource B
03Finance memo — exception processSource C
04Legal brief — auto-renewalSource D
05MSA §4.2 — renegotiation triggers (¶b)Source A

Diversity beats echo. Five copies of the same paragraph at the top of the list is the same as one paragraph — and worse, because the model treats the repetition as agreement.

Signal 04Self-RAG — how sure is this answer?

Every result that survives the funnel is tagged with a confidence label — HIGH, MEDIUM, or LOW — derived from the fused score, source agreement, and grounding strength. The agent uses the tag to decide what happens next: answer directly from the vault, fall back to a live web search, or ask the user a clarifying question. This is the signal that prevents the confidently-wrong answer. Without it, every result looks equally trustworthy.

High
Strong vault grounding · multi-source agreementfused = 0.91 · agreement = 3/3
Answer from vault
Med
Partial coverage · one strong source, weak corroborationfused = 0.66 · agreement = 1/3
Augment with web
Low
Topic absent or contradicted in vaultfused = 0.31 · agreement = 0/3
Ask user to clarify

The confidence tag is the dispatcher. The same architecture that finds the answer also says "I don't know" — out loud, with reasons. That's the difference between an assistant that's useful and one that's dangerous.

03 · The vault itselfIncremental ingestion, no batch re-runs

A retrieval architecture is only as good as the index underneath it, and in production the index is never finished. Documents arrive, get edited, get deleted. We don't re-embed the corpus on every run — that's wasteful and slow. Ingestion is manifest-driven: a record at ./memory/vectors/ingestion-manifest.json tracks the hash and timestamp of every file we've already processed. Each run diffs the filesystem against the manifest and only touches NEW, CHANGED, and DELETED entries. A file watcher rides alongside it and re-indexes documents the moment they change on disk — no batch jobs, no overnight rebuilds, no stale results sitting in the vault while someone waits for the next sync.

Ingestion manifest ./memory/vectors/ingestion-manifest.json
+ NEWcontracts/2024-msa-template.mda8f2…c1
+ NEWplaybooks/procurement.mdd3e9…7b
~ CHANGEDmemos/2024-q3-finance.mdb2c4…9a
~ CHANGEDlegal/auto-renewal-summary.md7e10…4f
- DELETEDdrafts/old-vendor-list.md— removed
· skiphandbook/onboarding.mdunchanged
· skiphandbook/values.mdunchanged
· skiprunbooks/incidents.mdunchanged
File watcher Live
CHGmemos/2024-q3-finance.mdjust now
NEWcontracts/2024-msa-template.md12s ago
DELdrafts/old-vendor-list.md38s ago
CHGlegal/auto-renewal-summary.md1m ago

Manifest-driven, watcher-fed. The vault is never stale and never re-embedded for nothing — only the rows that actually changed cost compute.

04 · Why we built it this wayWhat "enough" looks like in production

Every team building AI on top of their own documents discovers the same thing in roughly the same order. The vector demo is exhilarating, the first real users are humbling, and somewhere around month three the backlog fills up with the same handful of complaints: it can't find things by name, it gives me the same paragraph five times, it sounds sure about things it doesn't know. We've built this architecture for clients enough times that we now ship it as the default. Hybrid retrieval, RAG-Fusion rewriting, MMR diversity, Self-RAG confidence, manifest-driven ingestion — none of these are novel on their own. The work, and the value, is in stacking them so they cover each other's gaps and ship as one coherent system.

That's what "enough" looks like in production. Not a vector database with a chat wrapper — a retrieval architecture that knows what it knows, knows what it doesn't, and keeps the vault honest as the documents underneath change.

Building AI on top of your documents?

We build this architecture for clients in regulated industries, internal-tools teams, and anywhere a confidently-wrong answer carries a real cost.

Book a discovery call