Not another LLM eval platform.
Agent runtime reliability, built for MCP.
Langfuse and LangWatch are excellent tools for LLM evaluation and prompt quality. LangSight solves a different problem: monitoring and securing the tools your agents call at runtime — the MCP servers, HTTP APIs, and functions that break silently without anyone noticing.
LangSight
MCP-native · runtimeAgent runtime reliability for AI toolchains. Purpose-built for MCP health monitoring, security scanning, circuit breakers, loop detection, and tool call tracing.
Langfuse
LLM eval & tracingStrong platform for LLM prompt engineering, evals, and cost tracking. Not designed for MCP server health or security — complementary to LangSight.
LangWatch
LLM quality & guardrailsFocuses on LLM output quality, guardrails, and safety evaluations. Does not cover MCP infrastructure health, CVE scanning, or tool-level security.
LangTrace
OTEL-native tracingOpenTelemetry-native tracing for LLMs. Good for span capture and latency visibility. Does not do runtime guardrails, MCP health checks, or security scanning.
Arize Phoenix
LLM eval & RAGStrong for RAG pipeline evaluation, retrieval quality, and LLM tracing. Does not cover MCP server health, circuit breakers, or agent runtime security.
LangSmith
LangChain-nativeLangChain's observability platform for prompt debugging, dataset management, and evals. Tightly coupled to LangChain/LangGraph. No MCP health monitoring or security scanning.
Feature comparison
What each platform covers
Green rows are unique to LangSight — capabilities no other platform offers today.
| Feature | LangSight | Langfuse | LangWatch | LangTrace | Arize Phoenix |
|---|---|---|---|---|---|
| MCP server health monitoringLangSight is the only platform with native MCP health checks. | Yes | — | — | — | — |
| MCP security scanning (CVE + OWASP)CVE detection and 5 of 10 OWASP MCP checks, built-in. | Yes | — | — | — | — |
| Tool poisoning detectionInjection, unicode, and base64-encoded payload detection. | Yes | — | — | — | — |
| Schema drift detectionAlerts when a tool's schema changes unexpectedly between scans. | Yes | — | — | — | — |
| Loop detection + auto-killArgument-hash and sliding-window detection, configurable terminate action. | Yes | — | Partial | — | — |
| Budget guardrails (cost limits) | Yes | — | — | — | — |
| Tool-level circuit breakers | Yes | — | — | — | — |
| Agent tool call tracing | Yes | Yes | Yes | Yes | Yes |
| LLM input / output capture | Yes | Yes | Yes | Yes | Yes |
| Multi-agent call tree | Yes | Partial | Partial | Partial | Partial |
| Cost attribution per tool call | Yes | Yes | Partial | Partial | — |
| Anomaly detection | Yes | — | Partial | — | Partial |
| SLO tracking | Yes | — | Partial | — | — |
| CI/CD security gate (--ci flag) | Yes | — | — | — | — |
| Self-hosted (free forever) | Yes | Yes | Yes | Yes | Yes |
| License | Apache 2.0 | MIT / ELv2 | Apache 2.0 | Apache 2.0 | ELv2 |
| Primary focus | Agent runtime reliability | LLM evals + tracing | LLM quality + guardrails | OTEL-native tracing | LLM eval + RAG quality |
Comparison based on publicly available documentation as of March 2026. Features may change — check each project's docs for the latest.
LangSight + Langfuse work great together.
Use Langfuse for prompt evaluation and LLM quality. Use LangSight for the runtime layer — MCP health, security, and tool call tracing. They solve different problems at different layers of the stack. No overlap, no conflict.