v0.14.18 · Self-host free · Apache 2.0

See everything your
agents do.
Stop what they shouldn't.

The runtime reliability layer for AI agents. Prevent loops, enforce budgets, circuit-break failing tools, monitor MCP health. Instrument once — protection is automatic.

🔄Agent stuck in a loop? Killed after 3 repeat calls.
💸Session burning budget? Auto-stopped at your dollar limit.
💥MCP server went down? Circuit breaker opens, agents reroute.
🔍Schema drifted overnight? Detected before your agents fail.
$pip install langsight&&langsight init
Anthropic · CrewAI · Claude Agent SDKPostgres + ClickHouse
langsight · session trace
$ langsight sessions --id sess-f2a9b1
 
Trace: sess-f2a9b1 (support-agent)
5 tool calls · 1 failed · 2,134ms · $0.023
 
sess-f2a9b1
├── jira-mcp/get_issue 89ms ✓
├── postgres-mcp/query 42ms ✓
├── → billing-agent handoff
│ ├── crm-mcp/update 120ms ✓
│ └── slack-mcp/notify — ✗ timeout
 
Root cause: slack-mcp timed out at 14:32 UTC
└── Fix: check SLACK_TIMEOUT (currently 500ms)

Why LangSight exists

Observability tools watch.
LangSight prevents.

Every platform in the market traces what happened after the fact. Nobody stops loops, enforces budgets, or circuit-breaks failing tools at runtime. That's the gap LangSight fills.

0
Unique capabilities
no competitor has
0
Competitors with
runtime prevention
0x
More spans captured
vs LangSmith (April '26)
0
Vendor lock-in
Apache 2.0, self-hosted
CapabilityLangSightLangSmithLangfuseOpik
PREVENTLoop detection (pattern-based)Yes
PREVENTBudget enforcement (auto-kill)Yes
PREVENTCircuit breakers (tool-level)Yes
DETECTMCP health monitoringYes
DETECTSchema drift detectionYes
DETECTSecurity scanning (CVE + OWASP)Yes
MONITORAnomaly detection (z-score)Yes
MONITORBlast radius mappingYes
OBSERVEAgent tracingYesYesYesYes
OBSERVECost trackingYesYesYesYes
OBSERVELLM evalsYesYesYes

The bottom 3 rows are shared territory — every platform traces and tracks costs. The top 8 rows are empty for everyone except LangSight. That's the moat: runtime prevention at the tool layer.

As of April 2026, LangSight also captures 66x more spans than LangSmith in head-to-head benchmarks on Claude Agent SDK — but that's observability. The real difference is prevention.

The product

Built for the engineer
who gets paged at 2 AM.

Every page answers a question you'd ask during an incident. No dashboards for dashboards' sake.

MAP

The clearest way to see what your agents did.

A visual DAG of your entire agent session — coordinator → sub-agents → MCP servers — with call counts, latency, and full detail on click. Understand any session in seconds.

LangSight session graph showing coordinator delegating to sql_analyst, data_quality, and reporter agents connected to MCP servers
+Visual agent topology — see who called whom at a glance
+Click any node for full input/output JSON, latency, tokens
+Handoff arrows between agents with timing
+Per-agent and per-server call counts + avg latency
PREVENT

Per-agent guardrails. From the dashboard.

Loop detection, budget limits, max steps — configured per agent without code changes. Set thresholds, choose warn or terminate, control costs in real-time.

  • Loop detection: fire after N consecutive repeat calls
  • Action: Warn or Terminate — your choice per agent
  • Budget controls: max cost per session in USD
  • Soft alert threshold + hard kill limit
  • Max wall time to prevent runaway sessions
LangSight agent detail page showing loop detection settings, budget controls, and max steps configuration
MONITOR

Agent runtime health at a glance.

Sessions, tool calls, error rate, P99 latency, token usage — all in real-time. Overview, Models, and Tools tabs. 1h to 7d time ranges.

  • 4 KPI cards: sessions, tool calls, error rate, avg latency
  • Agent sessions + error rate trend charts
  • P99 latency tracking across all agents
  • Token usage breakdown (input, output, cache)
LangSight dashboard overview showing 124 sessions, 1032 tool calls, 6.6% error rate
MONITOR

MCP infrastructure monitoring.

Dedicated MCP section: tool call volume, error rates, P99 latency per server. Error breakdown by type. Fleet health at a glance.

  • MCP tool calls, error rate, P99 latency per server
  • Error breakdown: API unavailable, agent crash, auth errors
  • Server fleet health: green dots = all healthy
  • Correlate MCP failures with agent errors
LangSight MCP infrastructure dashboard showing tool calls, error rate, P99 latency, and 11 healthy servers
DETECT

Every session. Filterable. Searchable.

Session list with health tags, agent name, call count, duration, tokens, cost. Filter by status, agent, or health tag. Click to drill into full trace.

  • Health tags: success, failure, loop, budget exceeded
  • Filter by agent, status, health tag
  • Sort by duration, cost, token count
  • Click to drill into full session trace + graph
LangSight sessions list showing 124 sessions with filters for status, agents, health tags
MONITOR

Cost attribution. Per-tool. Per-agent. Per-model.

See exactly where your money goes. Total cost, LLM cost, tool call cost — broken down by service, tool, model, and cost type.

  • $1.22 total → $0.13 LLM + $1.09 tool calls
  • Cost per call: $0.001 for tools, $0.135 for Gemini
  • Filter by service, agent, model, cost type
  • 2.1M input tokens, 130K output tokens breakdown
LangSight cost attribution page showing $1.22 total cost broken down by tool
DETECT

MCP health + blast radius + AI root cause.

Per-server health panel: if this server went down, how many agents and sessions are affected? AI-powered root cause investigation built in.

  • Blast radius: agents, sessions, and calls at risk
  • Health, Tools, Consumers, Drift, Schema, Logs tabs
  • AI root cause investigation (Anthropic, 1h-24h lookback)
  • Click "Run Investigation" for automated RCA report
LangSight MCP servers page showing blast radius analysis and AI root cause investigation
DETECT

8 alert types. Slack + webhooks.

Agent failure, SLO breached, anomaly (critical + warning), security findings, MCP server down/recovered. Toggle each independently.

  • Agent Failure: session with failed tool calls
  • SLO Breached: service level objective violated
  • Anomaly: z-score >=3 (critical) or >=2 (warning)
  • MCP Down/Recovered: server health state changes
  • Incomplete session tracking and tagging
LangSight alerts page showing 8 alert rule types with toggles and Slack notifications

Integrations

Drop into any framework.

One line of code. Full tracing, prevention, and cost attribution. Zero-code for Claude Agent SDK and CrewAI.

Verified

Anthropic SDK

Messages API + Streaming

LLM tracingToken captureCost tracking
Verified

Claude Agent SDK

Multi-agent orchestration

Zero-code auto_patch()Subagent tracing66 spans captured
Verified

CrewAI

Event bus + 19 handlers

Native event busAgent attributionA2A handoffs
Beta

OpenAI SDK

Chat completions + Agents

LLM tracingToken captureFunction calls
Beta

Google Gemini

Generative AI SDK

LLM tracingToken capturegenerate_content
Verified

OTLP / OpenTelemetry

Any OTEL-compatible framework

OTLP ingestgen_ai conventionsAny language
Coming soon

LangChain / LangGraph

Chains + Graph agents

Callback handlerTool tracingGraph state
Coming soon

Pydantic AI

Type-safe agents

Agent tracingTool captureStructured output

Langfuse watches the brain. LangSight watches the hands.

Use alongside Langfuse, LangWatch, or LangSmith. They trace model reasoning. LangSight guards the tool layer — loops, budgets, health, security, blast radius.

What LangSight captures

Prevention + detection + monitoring.

🔄

Loop Detection

Pattern-based: same tool + same args = kill it

💰

Budget Enforcement

Per-session cost limits with auto-kill

Circuit Breakers

Tool-level, stateful, auto-recovery

🏥

MCP Health Checks

5 transports, latency, status, schema drift

🛡️

Security Scanning

CVE + OWASP MCP Top 10 + poisoning detection

🌳

Multi-Agent Trees

Parent → child span linking across agents

📊

Cost Attribution

Per-agent, per-tool, cache token breakdown

🚨

Anomaly Detection

Z-score vs 7-day baseline, auto-alerts

Get started

Zero to traced
in 5 minutes.

No account needed. No vendor dependency. Self-hosted on your infra. Apache 2.0 — fork it, modify it, ship it.

1

Install the SDK

10 seconds

One pip install. No Docker needed for the SDK — it works standalone with any Python agent.

Terminalbash
pip install langsight
2

Add two lines to your agent

30 seconds

auto_patch() instruments Claude Agent SDK, CrewAI, OpenAI, and Gemini automatically. Zero wrappers, zero config.

your_agent.pypython
import langsight

langsight.auto_patch()

# That's it. Every tool call, handoff,
# and LLM interaction is now traced.
# Loop detection + budget enforcement
# are active automatically.

# Your existing agent code — unchanged:
from claude_agent_sdk import query
result = await query(prompt="...", options=options)
3

Start the dashboard

5 minutes

One script generates secrets, starts Postgres + ClickHouse + API + Dashboard. You're looking at traces before your coffee is ready.

Terminalbash
# Clone and start the full stack
git clone https://github.com/LangSight/langsight
cd langsight

# Auto-generates secrets, starts 5 containers,
# seeds demo data
./scripts/quickstart.sh

# Dashboard: http://localhost:3002
# API:       http://localhost:8000
# Docs:      https://docs.langsight.dev

Ready to see what your agents are really doing?

Self-host on your own infrastructure. No data ever leaves your network. No paid tiers. No gated features. No usage limits.

Apache 2.0 — self-host free foreverNo account neededdocker compose up — full stack in 5 min