Cost TrackingBudgetProduction

AI Agent Cost Attribution: Tracking Spend Per Tool Call

Name: LangSight
Author: LangSight

A sub-agent in a customer support pipeline was retrying a geocoding tool endlessly. $1,800 per week, charged to the shared AI account, invisible in the monthly bill. Nobody knew until the finance team asked why the Anthropic invoice tripled. Per-tool cost attribution would have caught this on day one.

April 2, 2026·8 min read·LangSight Engineering

AI Agent Cost Attribution: Tracking Spend Per Tool Call

Cost AttributionTrack every dollar across agents and tools

The cost visibility problem

Every team running AI agents in production has the same blind spot: they know their total monthly AI spend (the Anthropic/OpenAI/AWS Bedrock invoice), but they cannot attribute that cost to specific tools, agents, or user sessions.

The total spend is a single number: $4,200 this month. But which agent is responsible for $3,100 of that? Which tool calls are the most expensive? Which sessions ran up costs that should have been stopped by a budget limit? Without per-call cost attribution, you cannot answer any of these questions.

This matters because AI agent costs are fundamentally unpredictable. A traditional API endpoint has relatively stable per-request cost. An AI agent session can cost $0.02 or $47 depending on how many tool calls the LLM decides to make, how many tokens each tool response contains, and whether the agent gets stuck in a loop. The variance is enormous — and without attribution, you cannot identify or fix the outliers.

What to track: two types of cost

AI agent sessions have two distinct cost categories, and you need to track both separately.

Token-based costs (LLM calls)

Every time the agent calls the LLM — to decide which tool to call, to process a tool response, to generate a final answer — you incur token costs. These are priced per input token and per output token, with rates that vary by model.

# Token cost calculation per LLM call
cost = (input_tokens * input_price_per_token) +
       (output_tokens * output_price_per_token)

# Example: Claude 3.5 Sonnet pricing
# Input: $3.00 / 1M tokens = $0.000003 per token
# Output: $15.00 / 1M tokens = $0.000015 per token

# A typical agent turn with 2,000 input + 500 output tokens:
# (2000 * 0.000003) + (500 * 0.000015) = $0.006 + $0.0075 = $0.0135

Token costs are predictable per call but unpredictable per session because the number of LLM calls depends on the agent's reasoning path. A simple query might take 2 LLM calls ($0.03). A complex query with multiple tool calls might take 15 LLM calls ($0.20). A looping agent can make 80+ LLM calls ($1.00+).

Call-based costs (tool invocations)

Some tools have per-call costs independent of tokens. A geocoding API charges $0.005 per call. A database query has compute cost. An email send has per-message cost. These costs are incurred by the tool call itself, not by the LLM processing.

LangSight tracks both cost types. Token costs are calculated automatically from the model pricing table. Call-based costs are configured per tool in .langsight.yaml:

# .langsight.yaml — cost configuration
models:
  claude-3-5-sonnet:
    input_cost_per_1m: 3.00
    output_cost_per_1m: 15.00
  claude-3-opus:
    input_cost_per_1m: 15.00
    output_cost_per_1m: 75.00

tool_costs:
  geocoding-mcp/geocode:
    type: call_based
    cost_per_call: 0.005
  email-mcp/send_email:
    type: call_based
    cost_per_call: 0.001
  postgres-mcp/query:
    type: call_based
    cost_per_call: 0.0001  # estimate based on compute

Per-session cost tracking

The most useful view of cost data is per session. A session is one complete agent interaction — from the initial user request to the final response. The session cost includes all LLM calls and all tool calls that occurred during that interaction.

$ langsight costs --hours 24

Session Cost Summary (last 24 hours)
┌──────────────────┬──────────┬───────┬──────────┬───────────┬────────┐
│ Session          │ Agent    │ Steps │ LLM Cost │ Tool Cost │ Total  │
├──────────────────┼──────────┼───────┼──────────┼───────────┼────────┤
│ sess_a1b2c3d4    │ support  │ 4     │ $0.054   │ $0.002    │ $0.056 │
│ sess_e5f6g7h8    │ support  │ 7     │ $0.094   │ $0.015    │ $0.109 │
│ sess_i9j0k1l2    │ analyst  │ 12    │ $0.162   │ $0.045    │ $0.207 │
│ sess_m3n4o5p6    │ support  │ 47    │ $2.340   │ $44.650   │ $47.00 │
│ ...              │ ...      │ ...   │ ...      │ ...       │ ...    │
├──────────────────┼──────────┼───────┼──────────┼───────────┼────────┤
│ Total (847 sess) │ —        │ 3,421 │ $34.20   │ $67.80    │ $102.0 │
└──────────────────┴──────────┴───────┴──────────┴───────────┴────────┘

Anomalies detected:
  sess_m3n4o5p6  $47.00 — 8.4x median session cost
    → geocoding-mcp/geocode called 8,930 times (loop detected)
    → Recommendation: enable loop detection + budget guardrail

That $47 session jumps out immediately. 47 steps. 8,930 tool calls to the geocoding endpoint. Without per-session cost tracking, this cost is buried in the aggregate monthly bill and invisible until someone manually investigates why the bill tripled.

Budget guardrails

Cost tracking tells you what happened. Budget guardrails prevent it from happening. LangSight supports two levels of budget control:

Soft alert: warn before it gets expensive

A soft budget alert fires when a session reaches a configured cost threshold. The session continues, but the team gets a notification. This is useful for monitoring without disrupting active sessions.

Hard limit: stop the session

A hard budget limit terminates the session when the cost exceeds the configured maximum. The agent receives a termination signal and the session is marked as budget_exceeded.

from langsight.sdk import LangSightClient

client = LangSightClient(
    url="http://localhost:8000",
    api_key="ls_...",

    # Budget guardrails
    max_cost_usd=1.00,        # hard stop at $1 per session
    budget_soft_alert=0.50,   # alert at $0.50

    # Also set step limits as a backstop
    max_steps=25,             # hard stop at 25 tool calls
    max_wall_time_s=120,      # hard stop at 2 minutes
)

# Every session through this client enforces budget limits
traced = client.wrap(mcp_session, agent_name="support-agent")

Budget guardrails and loop detection work together. Loop detection catches the specific pattern of repeated identical calls. Budget guardrails catch any session that gets expensive for any reason — loops, complex reasoning chains, expensive tools, or unexpected agent behavior.

Finding the expensive outliers

The most actionable insight from cost attribution is identifying outlier sessions. In a typical agent deployment, 80% of sessions cost under $0.10. The top 5% of sessions account for 60% of the total spend. Finding and fixing those outliers is the highest-ROI optimization.

Common causes of expensive outlier sessions:

Loop without detection — the agent calls the same tool repeatedly. Fix: enable loop detection.
Expensive tool overuse — the agent calls a geocoding API 200 times when 5 would suffice. Fix: add a per-tool call limit or instruct the agent to batch requests.
Large context windows — tool responses are massive (database returns 10,000 rows) and each LLM call processes the entire context. Fix: paginate tool responses, summarize before returning to the LLM.
Wrong model selection — a simple classification task routed to Claude 3 Opus instead of Haiku. Fix: use model routing based on task complexity.

Cost optimization strategies

Once you have attribution data, optimization becomes systematic:

Identify the most expensive tools. Sort tools by total cost. The top 3 tools usually account for 70% of tool costs. Optimize or rate-limit those first.
Set per-agent budgets. Different agents have different expected costs. A simple FAQ agent should cost $0.05 per session. A complex data analysis agent might legitimately cost $0.50. Set per-agent budgets that reflect expected behavior.
Review outlier sessions weekly. Look at the top 10 most expensive sessions from the past week. For each one, determine if the cost was justified (complex legitimate query) or waste (loop, wrong model, excessive tool calls).
Tune agent instructions. If an agent is making unnecessary tool calls, update its system prompt to be more efficient. "Check the database once for all needed fields" instead of making separate queries for each field.

Key takeaways

Cost attribution is essential for production agents. Without per-session, per-tool cost tracking, expensive outliers hide in aggregate bills until finance notices.
Track two cost types: token-based (LLM calls) and call-based (tool invocations). Both contribute to total session cost.
Budget guardrails prevent runaway costs. Soft alerts for visibility, hard limits for protection. Always use both together with loop detection.
Focus on outliers. 5% of sessions cause 60% of costs. Finding and fixing those outliers delivers the highest ROI.
One command to see costs: langsight costs --hours 24 shows per-session attribution with anomaly detection built in.

How to Detect and Stop AI Agent Loops — Loops are the most common cause of runaway costs. Detect and stop them before they burn your budget.
Setting SLOs for AI Agents — Budget adherence is one of the four SLO metrics. Learn how to define and enforce cost targets.
Circuit Breakers for AI Agents — Failed tool calls waste tokens. Circuit breakers prevent expensive retry cascades.
LangSight vs Langfuse — How LangSight's per-session cost tracking complements Langfuse's token-level cost tracking.

Track AI agent costs per tool call

LangSight attributes costs to specific tools, agents, and sessions. Budget guardrails stop runaway spend before it hits your invoice. Self-host free, Apache 2.0.

Get started →