Our work · Built-in Observability Platform

The answer to "what just happened and what did it cost?" is always already recorded.

Every LLM call inside an Agntic agent already carries the metadata an observability platform needs — session, user, tier, model, cost context, tokens, latency, tools called. Teams building on our platform inherit a working observability surface instead of building one.

Technology showcase Agntic Consulting Telemetry & cost attribution

Performance · liveAgntic deployment overview

1h24h7d30d

Heartbeat · 12 s ago

Spend · 24h

$14.621,283 LLM calls

Turns · 24h

216p95 latency 2.81 s

Active users

384 over $1.00 today

web_search

42$0.42 · tracked separately

Cost over time · hourly USD · $14.62 last 24h

Cost by context 7 of 10 labels

agent

$5.84

subagent_synthesis

$3.41

planning

$2.20

web_extract

$1.18

rewrite

$0.71

grading

$0.52

route

$0.46

Recent turnslast 24h · full traces

9:42 Alexsession 5d8e vault_search web_search 3.4 s $0.149

9:40 Priyasession 8a31 vault_search 1.9 s $0.038

9:37 Marcussession b29f planning subagent_synthesis 6.8 s $0.412

9:35 Beasession 4c0e vault_search 2.1 s $0.041

The dashboard is just a window onto the metrics that were already there. Every panel above is a method call against the same write-through store every LLM call has already flowed through.

Observability is the single most-skipped piece of AI infrastructure, and the single most expensive thing to retrofit. Teams ship an agent, deploy it, then six weeks later realize they have no idea which feature is driving cost, which step is slow, which model is misbehaving, or which user is hitting the limits. We built the platform with the data points wired into the lowest layer — the LLM call itself — so the answer to "what just happened and what did it cost?" is always already recorded.

01 · The call site is the truthMetadata is captured before the call returns

Every LLM call inside an Agntic agent flows through the Runnable model pool, which already knows its session, user, tier (fast / balanced / powerful), actual model name, cost-context label, input and output tokens, USD cost, and any Anthropic server-side web_search invocations. Nothing is sampled. Nothing is estimated. Every call writes through. The dashboards don't reconstruct cost from logs — they read it from a row that was written before the response left the function.

cost_records · one rowwritten before the turn responds

timestamp2026-05-26T09:42:18.041Z

session_id5d8e

user_idalex-marchetti

tierbalanced

modelclaude-sonnet-4-6

contextagent

input_tokens3,820

output_tokens420

cost_usd0.0178

web_search2 · $0.02

process_run_id7c2a

Captured at the call site, not from logs.Written before the model's response is returned to the agent.

Cost context tells you which step of the agent spent it.10+ labels: agent · planning · synthesis · grading · rewrite · route · web_extract · subagent_synthesis · tool_nudge · answer_retry

web_search is on its own line, not buried in tokens.Anthropic's server-side tool fee: $10 per 1,000 requests, tracked separately on top of token cost.

Linked to the process_run that produced it.So a crashed process's rows are queryable as "history" while the live one is "now."

Every field is the truth, not a reconstruction. The row is what the dashboards read; the row is what billing reads; the row is what the cost-by-step view groups by. There is no second source.

02 · Six durable tablesNot a log file. A schema.

Behind the platform is a SQLite-backed metrics store with six durable tables, indexed for the queries the dashboards actually run. A cost_records row per LLM call. A turn_metrics row per user-facing turn — exposed as the outputs view so a chat reply and a workflow-run delivery share one schema. A compression_events row for every memory-compaction with the USD it saved. An instance row for the logical Agntic deployment that survives restarts. A process_runs row for every Node process boot. Indexes on timestamp, session, user, process — the kind of indexes you make once when you're being careful, instead of bolting on later.

cost_recordsPer call

One row per LLM call. The grain of cost attribution.

session_iduser_idtiermodelcontexttokenscost_usdweb_search

turn_metricsPer turn

One row per user-facing turn. Latency, tools called, routing strategy, aggregated cost.

duration_mstool_usedtool_callsrouting_strategyartifact_typeretry_count

outputsView

Unified view over turn_metrics. Chat replies and workflow deliveries share one schema.

output_typesource_surfaceparent_output_idstatus

compression_eventsPer compaction

Every memory compaction with the USD saved by trimming the prompt.

timestampcost_saved_usdtokens_trimmed

instancePer deployment

The logical Agntic install. Survives restarts. Lifetime counters live here.

first_started_atlast_seen_atversion_shaenvtenant_id

process_runsPer boot

One row per Node process. Heartbeat every 30 s; the next boot marks stale rows as crashes.

instance_idstarted_atlast_heartbeat_atended_atexit_reason

Each row is one event, and each table is one shape of question. "What did this call cost?" is one query. "How did this turn perform?" is another. The schema lets the dashboards stay simple because the data is already in the right shape.

03 · Free for everything downstreamEvery new node inherits observability

Because the metadata is attached in the Runnable layer — not bolted on by hand at each call site — every node added to the agent inherits observability automatically. A new tool, a new workflow node, a new planning step, a new vertical-specific routing rule: each shows up in the dashboards from day one without any instrumentation work. Three subsystems already write to the store: the cost tracker on every LLM call, the agent graph on every user-facing turn, the workflow scheduler on every workflow run. Any new subsystem we add becomes a single call to the same store and arrives in the dashboards. This is the part teams discover they need three months in.

Writer 01

Cost trackerfires on every LLM call

Captures tier, model, context, tokens, USD, and web_search invocations the moment the model returns.

Writer 02

Agent graphfires on every user-facing turn

Closes the turn with latency, tools called, retries, routing strategy, and the artifact returned.

Writer 03

Workflow schedulerfires on every workflow run

Workflow deliveries share the turn schema — chat and workflow output sit in the same table.

Metrics store One shared schema, one query API cost_records · turn_metrics · outputs · compression_events · instance · process_runs

One store. Three writers today, more tomorrow. A new subsystem doesn't invent its own telemetry surface — it writes one row, and the existing dashboards already know how to read it.

04 · Live vs. historical, properly modeledPer-process heartbeat with crash detection

The instance / process_runs hierarchy lets the dashboard tell the difference between "this Node process since boot" and "this deployment over its lifetime." Every process emits a heartbeat every 30 seconds. The next boot walks the table; any open row whose last heartbeat is stale gets marked exit_reason = crash, with ended_at set to its last heartbeat. The dashboard never confuses a crashed process for a live one, and "since boot" never silently leaks into "lifetime." Most agent products skip this entirely.

Process runs · this deploymentinstance 7c2a · uptime 12d

run-018May 12 · 06:11
3d 4h · clean exit

run-019May 15 · 10:02
2d 7h · clean exit

run-020May 17 · 17:48
11h · marked at next boot

run-021May 18 · 05:04
5d 18h · clean exit

run-022May 24 · 02:14
heartbeat 12 s ago

Invariant The next boot marks any open process_runs row whose last_heartbeat_at is older than the stale window as exit_reason = crash — so the dashboard always shows at most one open run per instance, and a crashed process's rows stay queryable as history.

Live and historical are different rows, not different filters. The heartbeat hygiene is the telemetry equivalent of double-entry bookkeeping — every boot reconciles the previous one.

05 · Why we built it this wayThe plumbing buyers value most

Observability is what separates "we shipped an agent" from "we run an agent in production." Other vendors leave it to the customer — pick a third-party platform, instrument every call, prove the cost back to finance, hope the schema doesn't drift when the agent gets a new step. We built the layer that makes the second part free. For a consulting firm, this is the difference between a demo that wows and a deployment that survives — and it's the kind of plumbing buyers value most once they understand why other vendors don't have it.

Six durable tables, write-through on every LLM call, per-process heartbeat with crash detection, a shared schema across chat replies and workflow runs, ten-plus cost-context labels already in use today, Anthropic web_search tracked on its own line. None of the pieces are exotic. The work, and the value, is in being the team that actually wired them in before the agent shipped.

Need to know what your agent just cost you?

We build AI systems for clients who have to answer the cost, latency, and trace questions on day one — not bolt on a telemetry platform six months after the demo.

Book a discovery call →