How to Monitor MCP Servers in Production
Your agents depend on MCP servers. If one goes down, your agents fail silently — no error message, no alert, just a confused user wondering why nothing happened. Here is how to set up proactive health monitoring, latency tracking, and uptime alerting for your entire MCP fleet.

It is 2 AM and your agent just stopped working
You get paged at 2 AM. The customer support agent that handles overnight tickets has not responded to anything in 90 minutes. Thirty-seven tickets are stuck in the queue. The agent is not erroring out — it is just not doing anything.
After 45 minutes of digging through logs, you find the root cause: the slack-mcp server that the agent uses to post responses ran out of memory and crashed at 12:14 AM. No alert fired. No health check caught it. The agent tried to call the send_message tool, got a timeout, retried three times, hit its retry budget, and silently gave up.
This is not a hypothetical. This is the most common failure pattern for teams running MCP-based agents in production. The agent is fine. The LLM is fine. The infrastructure that the agent depends on — the MCP servers — is the blind spot.
Langfuse and similar tools will show you what the LLM decided. They will not show you that the tool it tried to call has been down for two hours. You need a different layer of observability: one that watches the hands, not the brain.
Why MCP servers need their own monitoring
MCP servers are not regular APIs. They run in three different transport modes, each with fundamentally different failure characteristics. You cannot treat them like a REST endpoint behind an ALB.
An MCP server might be a local subprocess communicating over stdio, an SSE endpoint streaming events over a persistent HTTP connection, or a standard HTTP server accepting JSON-RPC requests. The protocol is the same — the operational reality is completely different.
Traditional monitoring tools (Datadog, Prometheus, Grafana) can monitor HTTP endpoints. They cannot monitor a subprocess that communicates over stdin/stdout. They cannot detect that an SSE connection has silently stalled — still technically open, but no longer sending events. They do not understand MCP's tool schema or know when a server has changed its tool definitions out from under your agents.
MCP monitoring requires protocol awareness. You need something that speaks MCP natively, can negotiate all three transports, and understands the difference between "the server is up" and "the server is healthy."
The three transport types and how they fail
stdio: process crashes, OOM, and hanging calls
stdio-based MCP servers run as child processes. The MCP client spawns them with a command like python server.py or npx @mcp/postgres, then communicates over stdin/stdout using JSON-RPC.
Failure modes are process-level: the server process gets OOM-killed by the OS, crashes with an unhandled exception, or hangs on a blocking call (a common issue with synchronous database drivers in an async context). When a stdio process dies, the only signal is that the pipe closes. If the client is not watching for broken pipes, it may not notice for minutes.
# Common stdio failure: server hangs on blocking I/O # The process is alive but not responding to stdin $ ps aux | grep mcp-server user 12847 98.2 4.1 python server.py # alive, but stuck # LangSight detects this via synthetic health probes $ langsight mcp-health --server postgres-mcp postgres-mcp DOWN timeout after 5000ms no response to tools/list
SSE: connection drops and event stream stalls
SSE (Server-Sent Events) transport uses a long-lived HTTP connection. The client connects once and receives a stream of events. This is the most common transport for remote MCP servers.
SSE connections fail in ways that are hard to detect. The TCP connection can remain open even after the server process stops writing events — a half-open connection that looks healthy from the client side. Load balancers can silently close idle connections without sending a FIN. The server can fall behind on event processing, causing the event buffer to grow until it triggers backpressure or OOM.
The worst failure mode is a reconnection storm: the client detects a dropped connection, reconnects, the server accepts the connection, then immediately drops it due to resource exhaustion. The client retries in a tight loop, making the server's resource situation worse.
# SSE stall detection in LangSight # Server is "connected" but hasn't sent events in 120 seconds $ langsight mcp-health ┌──────────────┬──────────┬──────────┬─────────────────────────────┐ │ Server │ Status │ Latency │ Notes │ ├──────────────┼──────────┼──────────┼─────────────────────────────┤ │ jira-mcp │ STALE │ — │ No events in 120s (SSE) │ │ github-mcp │ UP │ 89ms │ 14 tools, schema unchanged │ │ slack-mcp │ DOWN │ — │ Connection refused │ └──────────────┴──────────┴──────────┴─────────────────────────────┘
StreamableHTTP: session management and standard HTTP failures
StreamableHTTP is the newest MCP transport, introduced in the MCP specification in late 2025. It combines standard HTTP request/response with optional server-initiated streaming via SSE upgrades.
This transport inherits all the standard HTTP failure modes — 5xx errors, TLS certificate expiry, DNS resolution failures, connection pool exhaustion — plus MCP-specific issues around session management. StreamableHTTP servers maintain session state; if the server restarts, all active sessions are invalidated and clients must re-initialize. A rolling deployment of the MCP server can cause a wave of session re-initializations across all connected agents.
Because StreamableHTTP is stateful, monitoring needs to track not just "is the server responding" but "is my session still valid." A 200 OK on the health endpoint does not mean existing sessions are functional.
What to monitor: the five signals
Effective MCP monitoring requires tracking five signals. Miss any one and you have a blind spot.
- Latency distribution (p50, p95, p99) — Not just average latency, which hides tail latency spikes. A server with 50ms p50 but 8-second p99 will cause intermittent agent timeouts that are nearly impossible to debug without percentile tracking.
- Error rate by category — Distinguish between transient errors (timeouts, connection resets) and permanent errors (tool not found, invalid arguments). A 2% timeout rate might be acceptable. A 0.1% "tool not found" rate means the server's schema has drifted.
- Uptime and availability — Track availability over rolling windows: 1 hour, 24 hours, 7 days. An MCP server that goes down for 30 seconds every hour has 99.2% uptime but causes agent failures 24 times a day.
- Schema consistency — Monitor tool count and tool schemas over time. If a server previously exposed 14 tools and now exposes 12, something changed. If a tool's input schema changed, agents tested against the old schema may start producing errors or incorrect results.
- Tool call success rate — The most important signal: what percentage of actual tool invocations complete successfully? This is different from server uptime — the server can be "up" while specific tools fail due to backend issues (database connection pool exhausted, third-party API rate limited).
Proactive vs passive monitoring
Most agent observability platforms do passive monitoring: they record tool calls that agents make during real interactions and report on them after the fact. This is useful for understanding agent behavior, but it is useless for detecting failures before users are affected.
Passive monitoring records traces of real agent sessions. You see that at 2:14 AM, the agent called send_message on slack-mcp and got a timeout. You discover this when you review traces the next morning — or when a user complains.
Proactive monitoring sends synthetic health probes to every MCP server on a schedule, regardless of whether any agent is currently using it. Every 30 seconds, a probe calls tools/list on each server, measures latency, verifies the response schema, and compares tool definitions against the last known snapshot. If a server goes down at 12:14 AM, you get an alert at 12:14 AM — not when the first agent tries to use it 90 minutes later.
LangSight does both. Proactive health probes run continuously and catch infrastructure-level failures. Passive trace collection captures real tool call patterns and identifies issues that only appear under real workloads (specific tool arguments that trigger slow queries, for example).
# Proactive: synthetic health probes every 30 seconds $ langsight monitor --interval 30 # Passive: collect traces from OTEL-instrumented agents # (configure OTEL collector to forward to LangSight) export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
Setting up MCP health monitoring with LangSight
LangSight can be monitoring your entire MCP fleet in under two minutes. Here is the setup flow.
Step 1: Install and auto-discover
$ pip install langsight $ langsight init Discovering MCP servers... Found Claude Desktop config: 7 servers Found Cursor config: 3 servers Found VS Code config: 2 servers Deduplicated: 9 unique servers Written to .langsight.yaml 9 servers configured Default health check interval: 30s Default alert channel: stdout
langsight init reads your Claude Desktop, Cursor, and VS Code MCP configurations automatically. It deduplicates servers that appear in multiple configs (same command or same URL) and writes a .langsight.yaml with all discovered servers.
Step 2: First health check
$ langsight mcp-health MCP Server Health ┌──────────────────┬──────────┬──────────┬────────┬─────────────────────────────┐ │ Server │ Status │ Latency │ Tools │ Notes │ ├──────────────────┼──────────┼──────────┼────────┼─────────────────────────────┤ │ postgres-mcp │ UP │ 42ms │ 5 │ Schema unchanged │ │ slack-mcp │ UP │ 156ms │ 8 │ Schema unchanged │ │ github-mcp │ UP │ 89ms │ 14 │ Schema unchanged │ │ jira-mcp │ UP │ 203ms │ 11 │ Schema unchanged │ │ s3-mcp │ UP │ 67ms │ 4 │ Schema unchanged │ │ filesystem-mcp │ UP │ 12ms │ 6 │ Schema unchanged │ │ redis-mcp │ DOWN │ — │ — │ Connection refused │ │ elasticsearch-mcp│ DEGRADED │ 4,821ms │ 7 │ Latency > 3000ms threshold │ │ notion-mcp │ UP │ 312ms │ 9 │ Schema unchanged │ └──────────────────┴──────────┴──────────┴────────┴─────────────────────────────┘ 2 issues detected: CRITICAL redis-mcp DOWN — Connection refused (port 6379) WARNING elasticsearch-mcp DEGRADED — p99 latency 4,821ms (threshold: 3,000ms)
One command, full fleet status. Each server gets a synthetic tools/list call that verifies the server is reachable, responding, and returning a valid tool schema. The latency column shows the round-trip time for that probe.
Step 3: Continuous monitoring
$ langsight monitor --interval 30 --daemon LangSight monitor started Checking 9 servers every 30 seconds Alert channels: slack (#mcp-alerts), stdout Schema snapshots: enabled PID file: /var/run/langsight-monitor.pid Press Ctrl+C to stop (or run as systemd service)
The monitor command runs health probes on a loop. Every 30 seconds (configurable), each server gets a health check. State transitions (UP to DOWN, UP to DEGRADED, DOWN to UP) trigger alerts. The monitor tracks schema snapshots so it can detect when a server's tool definitions change between checks.
For production deployments, run the monitor as a systemd service or Docker container. It uses minimal resources — a single Python process with async I/O, typically under 50MB of memory even with 100+ servers.
Step 4: Configuration
# .langsight.yaml
monitoring:
interval_seconds: 30
timeout_seconds: 5
consecutive_failures_before_down: 3
servers:
- name: postgres-mcp
transport: stdio
command: "python /opt/mcp/postgres/server.py"
tags: ["database", "critical"]
thresholds:
latency_warning_ms: 500
latency_critical_ms: 2000
- name: slack-mcp
transport: sse
url: "https://mcp.internal.company.com/slack"
tags: ["communication", "critical"]
- name: analytics-mcp
transport: streamable_http
url: "https://mcp.internal.company.com/analytics"
tags: ["analytics", "non-critical"]
thresholds:
latency_warning_ms: 2000
latency_critical_ms: 10000
alerts:
channels:
- type: slack
webhook_url: "LANGSIGHT_SLACK_WEBHOOK"
channel: "#mcp-alerts"
- type: webhook
url: "https://pagerduty.com/integrate/..."
events: ["down", "schema_drift"]Alerting on degradation
Health checks without alerts are just dashboards you forget to look at. LangSight supports three alert channels out of the box: Slack (with Block Kit formatting), generic webhooks (PagerDuty, OpsGenie, custom), and stdout (for CI/CD and local use).
Alert rules are state-machine based, not threshold-based. A single slow response does not trigger an alert. Three consecutive failures transitions the server from UP to DOWN, which fires the alert. When the server recovers, a recovery alert fires. This prevents alert fatigue from transient network blips.
Alert types include:
- Server DOWN — Server failed consecutive health checks. Includes last known status, duration of outage, and which agents are affected.
- Latency spike — p95 latency exceeded the configured threshold for the past N checks. Includes latency trend and historical comparison.
- Schema drift — A tool was added, removed, or had its schema changed. Includes a diff of the old and new schemas so you can assess impact before updating agents.
- Recovery — Server transitioned from DOWN or DEGRADED back to UP. Includes total downtime duration.
# Example Slack alert for a schema drift detection
{
"server": "github-mcp",
"event": "schema_drift",
"severity": "warning",
"message": "Tool schema changed on github-mcp",
"details": {
"tools_added": [],
"tools_removed": ["delete_branch"],
"tools_modified": [{
"name": "create_pull_request",
"changes": "Added required field: draft (boolean)"
}],
"previous_snapshot": "2026-03-30T14:22:00Z",
"current_snapshot": "2026-04-02T08:15:00Z"
}
}The MCP server scorecard
Beyond point-in-time health checks, LangSight generates a scorecard for each MCP server. The scorecard grades servers A through F across four dimensions:
- Availability — Uptime percentage over the past 7 days. A: 99.9%+, B: 99.5%+, C: 99%+, D: 95%+, F: below 95%.
- Performance — p95 latency relative to configured thresholds. A: consistently under warning threshold, F: regularly exceeding critical threshold.
- Security — Authentication configured, TLS enabled, no known CVEs, OWASP MCP Top 10 compliance. Each missing item drops the grade.
- Reliability — Schema stability, error rate, mean time between failures. Servers that change schemas frequently or have high error rates get lower grades.
$ langsight mcp-health --scorecard MCP Server Scorecard (7-day rolling) ┌──────────────────┬───────┬───────┬───────┬───────┬─────────┐ │ Server │ Avail │ Perf │ Sec │ Rel │ Overall │ ├──────────────────┼───────┼───────┼───────┼───────┼─────────┤ │ postgres-mcp │ A │ A │ B │ A │ A │ │ github-mcp │ A │ B │ A │ A │ A │ │ slack-mcp │ B │ C │ B │ B │ B │ │ jira-mcp │ A │ B │ C │ A │ B │ │ elasticsearch-mcp│ C │ D │ B │ C │ C │ │ redis-mcp │ F │ — │ D │ F │ F │ └──────────────────┴───────┴───────┴───────┴───────┴─────────┘
Monitoring at scale: 35+ servers
Small teams run 3 to 5 MCP servers. Enterprise teams deploying agents across departments often run 35 or more. At that scale, a CLI table with 35 rows is not sufficient — you need filtering, grouping, and historical trends.
LangSight handles this in two ways. For CLI users, the --filter and --tag flags let you focus on what matters:
# Show only servers that are not healthy $ langsight mcp-health --filter status!=up # Show only critical-tagged servers $ langsight mcp-health --tag critical # Show servers grouped by team $ langsight mcp-health --group-by tag:team # JSON output for custom dashboards or scripts $ langsight mcp-health --json | jq '.servers[] | select(.status == "down")'
For teams that need historical trends, the LangSight dashboard (self-hosted, ships as a Docker container) shows latency heatmaps, uptime timelines, schema change history, and alert logs. Filter by project, tag, or status. The dashboard reads from the same data store the CLI writes to — no separate configuration required.
Integrating with existing observability
LangSight does not replace your existing monitoring stack. It integrates with it. Health check results are exported as Prometheus metrics, so you can add MCP server panels to your existing Grafana dashboards:
# Prometheus metrics exposed on :9090/metrics
langsight_mcp_health_status{server="postgres-mcp"} 1 # 1=up, 0=down
langsight_mcp_latency_p95{server="postgres-mcp"} 42.3
langsight_mcp_tool_count{server="postgres-mcp"} 5
langsight_mcp_schema_changes_total{server="postgres-mcp"} 0
langsight_mcp_errors_total{server="postgres-mcp",type="timeout"} 2Alerts can forward to any webhook — PagerDuty, OpsGenie, custom Slack bots, or a Lambda function that auto-restarts crashed stdio processes. The alert payload is structured JSON, so integration with your incident management workflow is straightforward.
Key takeaways
- MCP servers are the hidden dependency. Your agents are only as reliable as the tools they call. If you monitor the LLM but not the MCP servers, you are monitoring half the system.
- Each transport fails differently. stdio processes crash silently. SSE connections stall. StreamableHTTP sessions expire. Your monitoring must understand all three.
- Proactive beats passive. Recording traces of real tool calls tells you what happened. Proactive health probes tell you what is about to happen. You need both, but proactive monitoring is what prevents 2 AM pages.
- Schema drift is a production risk. When an MCP server changes its tool definitions, agents tested against the old schema will silently break. Monitor tool schemas, not just availability.
- One command to start.
pip install langsight && langsight init && langsight monitorgets you from zero to full fleet monitoring in under two minutes. No Docker required for local mode.
Related articles
- Schema Drift in MCP: The Silent Failure Your Agents Cannot Detect — When MCP server schemas change, agents fail silently. How to detect drift before your users notice.
- Circuit Breakers for AI Agents — When an MCP server goes down, circuit breakers prevent cascading failures across your agent fleet.
- Blast Radius Mapping — Understand which agents and sessions are affected when a specific MCP server goes down.
- Setting SLOs for AI Agents — Define measurable reliability targets for your agent fleet including uptime, latency, and success rate.
Monitor your MCP fleet
LangSight adds health monitoring to your entire MCP server fleet in one command. Proactive health probes, latency tracking, schema drift detection, and alerting. Self-host free, Apache 2.0.
Get started →