The answer to "what just happened and what did it cost?" is always already recorded.
Every LLM call inside an Agntic agent already carries the metadata an
observability platform needs — session, user, tier, model, cost context,
tokens, latency, tools called. Teams building on our platform inherit a
working observability surface instead of building one.
The dashboard is just a window onto the metrics that were already there. Every panel above is a method call against the same write-through store every LLM call has already flowed through.
Observability is the single most-skipped piece of AI infrastructure, and
the single most expensive thing to retrofit. Teams ship an agent, deploy
it, then six weeks later realize they have no idea which feature is
driving cost, which step is slow, which model is misbehaving, or which
user is hitting the limits. We built the platform with the data points
wired into the lowest layer — the LLM call itself — so the answer to
"what just happened and what did it cost?" is always already recorded.
01 · The call site is the truthMetadata is captured before the call returns
Every LLM call inside an Agntic agent flows through the Runnable model
pool, which already knows its session, user, tier
(fast / balanced / powerful),
actual model name, cost-context label, input and output tokens, USD
cost, and any Anthropic server-side web_search
invocations. Nothing is sampled. Nothing is estimated. Every
call writes through. The dashboards don't reconstruct cost
from logs — they read it from a row that was written before the
response left the function.
cost_records · one rowwritten before the turn responds
timestamp2026-05-26T09:42:18.041Z
session_id5d8e
user_idalex-marchetti
tierbalanced
modelclaude-sonnet-4-6
contextagent
input_tokens3,820
output_tokens420
cost_usd0.0178
web_search2 · $0.02
process_run_id7c2a
1
Captured at the call site, not from logs.Written before the model's response is returned to the agent.
2
Cost context tells you which step of the agent spent it.10+ labels: agent · planning · synthesis · grading · rewrite · route · web_extract · subagent_synthesis · tool_nudge · answer_retry
3
web_search is on its own line, not buried in tokens.Anthropic's server-side tool fee: $10 per 1,000 requests, tracked separately on top of token cost.
4
Linked to the process_run that produced it.So a crashed process's rows are queryable as "history" while the live one is "now."
Every field is the truth, not a reconstruction. The row is what the dashboards read; the row is what billing reads; the row is what the cost-by-step view groups by. There is no second source.
02 · Six durable tablesNot a log file. A schema.
Behind the platform is a SQLite-backed metrics store with
six durable tables, indexed for the queries the
dashboards actually run. A cost_records row per LLM call.
A turn_metrics row per user-facing turn — exposed as the
outputs view so a chat reply and a workflow-run delivery
share one schema. A compression_events row for every
memory-compaction with the USD it saved. An instance row
for the logical Agntic deployment that survives restarts. A
process_runs row for every Node process boot. Indexes on
timestamp, session, user, process — the kind of indexes you make once
when you're being careful, instead of bolting on later.
cost_recordsPer call
One row per LLM call. The grain of cost attribution.
Each row is one event, and each table is one shape of question. "What did this call cost?" is one query. "How did this turn perform?" is another. The schema lets the dashboards stay simple because the data is already in the right shape.
03 · Free for everything downstreamEvery new node inherits observability
Because the metadata is attached in the Runnable layer — not bolted on
by hand at each call site — every node added to the agent inherits
observability automatically. A new tool, a new workflow node, a new
planning step, a new vertical-specific routing rule: each shows up in
the dashboards from day one without any instrumentation work. Three
subsystems already write to the store: the cost tracker on every LLM
call, the agent graph on every user-facing turn, the workflow scheduler
on every workflow run. Any new subsystem we add becomes a single call
to the same store and arrives in the dashboards. This is the part
teams discover they need three months in.
Writer 01
Cost trackerfires on every LLM call
Captures tier, model, context, tokens, USD, and web_search invocations the moment the model returns.
→
Writer 02
Agent graphfires on every user-facing turn
Closes the turn with latency, tools called, retries, routing strategy, and the artifact returned.
→
Writer 03
Workflow schedulerfires on every workflow run
Workflow deliveries share the turn schema — chat and workflow output sit in the same table.
One store. Three writers today, more tomorrow. A new subsystem doesn't invent its own telemetry surface — it writes one row, and the existing dashboards already know how to read it.
04 · Live vs. historical, properly modeledPer-process heartbeat with crash detection
The instance / process_runs hierarchy lets the
dashboard tell the difference between "this Node process since boot"
and "this deployment over its lifetime." Every process emits a
heartbeat every 30 seconds. The next boot walks the table; any open row
whose last heartbeat is stale gets marked exit_reason = crash,
with ended_at set to its last heartbeat. The dashboard
never confuses a crashed process for a live one, and "since boot"
never silently leaks into "lifetime." Most agent products skip this
entirely.
Process runs · this deploymentinstance 7c2a · uptime 12d
run-018May 12 · 06:11 3d 4h · clean exit
run-019May 15 · 10:02 2d 7h · clean exit
run-020May 17 · 17:48 11h · marked at next boot
run-021May 18 · 05:04 5d 18h · clean exit
run-022May 24 · 02:14 heartbeat 12 s ago
InvariantThe next boot marks any open process_runs row whose last_heartbeat_at is older than the stale window as exit_reason = crash — so the dashboard always shows at most one open run per instance, and a crashed process's rows stay queryable as history.
Live and historical are different rows, not different filters. The heartbeat hygiene is the telemetry equivalent of double-entry bookkeeping — every boot reconciles the previous one.
05 · Why we built it this wayThe plumbing buyers value most
Observability is what separates "we shipped an agent" from "we run an
agent in production." Other vendors leave it to the customer — pick a
third-party platform, instrument every call, prove the cost back to
finance, hope the schema doesn't drift when the agent gets a new step.
We built the layer that makes the second part free. For a consulting
firm, this is the difference between a demo that wows and a deployment
that survives — and it's the kind of plumbing buyers value most once
they understand why other vendors don't have it.
Six durable tables, write-through on every LLM call, per-process
heartbeat with crash detection, a shared schema across chat replies
and workflow runs, ten-plus cost-context labels already in use today,
Anthropic web_search tracked on its own line. None of the
pieces are exotic. The work, and the value, is in being the team that
actually wired them in before the agent shipped.
Need to know what your agent just cost you?
We build AI systems for clients who have to answer the cost, latency, and trace questions on day one — not bolt on a telemetry platform six months after the demo.