Causal meters: per-node cost attribution for agent workloads
Abstract
Agent runs spend money in ways invoices cannot explain: a retry storm in one tool adapter shows up as a vague increase in monthly model spend. We describe the runtime's cost-attribution scheme, which meters every token, tool call, and retry at the execution-graph node that caused it — distinguishing intrinsic cost from induced cost — and aggregates the result into per-feature unit economics that survive concurrency and caching.
Why run-level accounting fails
Billing a whole run to the feature that started it makes two distinct failures invisible. Shared work is double-charged: a context fetch reused by three nodes appears in three features' costs or none. And induced cost is misassigned: when a flaky adapter forces upstream re-planning, the planner's extra tokens get billed to planning, though the adapter caused them.
The meter
Because every run is a deterministic graph (R-001), attribution can be structural. Each edge carries a meter recording direct spend — tokens, tool fees, wall-clock — and a causal tag naming the node whose demand created the edge. Retries carry the tag of the failing node, not the retrying one. Shared sub-graphs are metered once and amortised across their consumers by marginal use, so caching shows up as falling attributed cost rather than accounting noise.
meter run-84f2 node direct induced total plan $0.0041 $0.0007 $0.0048 cal_adapter $0.0002 $0.0119 $0.0121 ← retry storm brief $0.0089 — $0.0089 charge(feature=morning-brief) = Σ = $0.0258
What it changed
The first month of causal metering reordered our optimisation queue. The most expensive feature by invoice was the cheapest by intrinsic cost — its spend was induced by one adapter's timeout policy, fixed in an afternoon for a 38% reduction. Per-feature unit economics now appear in the same monthly review as retention, which is the point: an ecosystem sequenced on margin gates needs cost numbers as trustworthy as its usage numbers.
Open problem: attribution across speculative execution, where the runtime races two plans and discards one. Charging abandoned work to the feature is honest but discourages a useful optimisation; we currently book it to a runtime overhead account and are not satisfied with that answer.
cite as: Mynd Labs Research Note R-003 (2026)