Self-Hosting AI Observability: Why Your Data Should Never Leave
Every tool call your agent makes — the tool name, the arguments, the response — is flowing to a third-party SaaS for observability. Those arguments include customer names, database queries, internal API endpoints, and sometimes API keys. Your agent traces are the most sensitive data in your stack, and they are leaving your network.

What agent traces actually contain
An agent trace from a typical customer support session includes:
- The user's original message (potentially containing PII: "My name is Sarah Johnson, my account email is [email protected]")
- Every LLM prompt and response (including the system prompt with your proprietary agent logic)
- Every tool call with full arguments (
crm-mcp/get_customer(email="[email protected]")) - Every tool response (full customer record: name, email, billing address, plan details, payment method)
- Internal API endpoints referenced in tool configurations
- Database queries with table names and column names (reveals your schema)
When you send this data to a cloud observability SaaS, you are sending your customer PII, your proprietary business logic, your internal infrastructure details, and your data schema to a third party. Even if that third party has excellent security practices, this creates risks that many organizations cannot accept.
The four risks of cloud-hosted agent observability
1. Data residency and compliance
GDPR requires that European customer data stay within the EU unless specific legal mechanisms are in place. SOC 2 Type II requires that you can demonstrate control over where customer data is stored and who can access it. HIPAA requires that protected health information stays within covered entities and business associates.
If your agent processes customer data (and most agents do), the traces contain that customer data. Sending traces to a US-based SaaS creates a data residency issue for EU customers. Sending traces to any third party creates a compliance documentation burden.
Self-hosting eliminates this entirely. The data stays in your VPC, in your region, under your access controls. Compliance audits are straightforward: "All observability data is stored in our eu-west-1 PostgreSQL instance with encryption at rest."
2. Vendor lock-in
Cloud observability platforms store your historical trace data. If you decide to switch vendors, that historical data is either inaccessible (proprietary format) or requires a costly migration. The longer you use the platform, the harder it is to leave.
With self-hosted, your data is in standard PostgreSQL and ClickHouse databases. You own the data, the format, and the access. Migrate, fork, or build custom tooling on top — no vendor permission required.
3. Cost scaling
Cloud observability pricing scales with data volume. As your agent fleet grows — more agents, more sessions, more tool calls — the observability bill grows proportionally. At scale, observability costs can rival the LLM costs themselves.
A rough comparison for a team running 10 agents with 1,000 sessions per day:
| Platform | Monthly Cost | Data Residency |
|---|---|---|
| Datadog LLM Observability | ~$2,400/mo | US/EU (their infra) |
| Langsmith (cloud) | ~$400/mo | US (their infra) |
| LangSight (self-hosted) | $0 | Your VPC |
Self-hosting is not free — you pay for the compute and storage. But a PostgreSQL and ClickHouse instance for 10 agents costs roughly $50-100/month on AWS, compared to $2,400+/month for enterprise cloud observability.
4. Attack surface
Every third-party service with access to your data is part of your attack surface. A breach at your observability vendor exposes your agent traces — which contain customer PII, internal APIs, and database schemas. The observability vendor becomes the most data-rich target in your supply chain.
Self-hosting contains the attack surface within your existing security perimeter. The same security controls, access policies, and monitoring that protect your production databases also protect your observability data.
Why self-hosting matters for AI specifically
Traditional infrastructure observability (CPU metrics, request latency, error rates) contains minimal sensitive data. A Prometheus metric like http_requests_total does not contain PII.
AI agent observability is fundamentally different. Agent traces contain the full content of every interaction — user messages, LLM reasoning, tool arguments, and tool responses. This data is among the most sensitive in your entire system. It is the complete record of every action your AI took on behalf of every user.
The sensitivity of agent trace data is why self-hosting is not just a nice-to-have — it is a security and compliance requirement for teams processing customer data through AI agents.
LangSight's self-hosted architecture
LangSight is designed to be self-hosted from the ground up. There is no cloud service to phone home to, no telemetry sent to our servers, and no feature gating behind a paid cloud tier.
# Full stack in one command $ docker compose up -d Creating langsight-postgres ... done Creating langsight-clickhouse ... done Creating langsight-api ... done Creating langsight-dashboard ... done # That's it. Full observability stack running in your VPC. # PostgreSQL: app state, configs, alerts # ClickHouse: time-series health data, traces # API: FastAPI, async, ~50MB memory # Dashboard: Next.js, self-contained
The CLI mode requires even less — no Docker at all. langsight init && langsight monitor stores data in a local SQLite database. Suitable for individual developers and small teams who want monitoring without infrastructure overhead.
The Apache 2.0 advantage
LangSight uses the Apache 2.0 license — the most permissive open-source license commonly used for infrastructure software. This means:
- No open-core restrictions. Every feature is available in the self-hosted version. There is no "enterprise edition" with features locked behind a commercial license.
- Fork and modify freely. If you need to customize LangSight for your organization — add a custom alert channel, change the data model, integrate with an internal tool — you can fork and modify without restriction.
- No contributor license agreement trap. Some open-source projects use CLAs that allow the company to relicense contributions. Apache 2.0 has clear patent grants and does not require a CLA.
- Sell it if you want. Apache 2.0 explicitly allows commercial use. You can build a managed service on top of LangSight. You can bundle it with your product. No restrictions.
What about offline environments?
Some organizations operate in air-gapped environments — government, defense, highly regulated finance. These environments cannot connect to any external service, including for CVE lookups.
LangSight supports fully offline operation. The CVE database can be bundled locally (updated via a manual process). Health monitoring, schema drift detection, loop detection, and budget enforcement all work without any external network access. The only feature that requires external connectivity is the optional CVE database update from OSV — and even this can be disabled or routed through a controlled proxy.
Key takeaways
- Agent traces are the most sensitive data in your stack. They contain user PII, internal APIs, database schemas, and proprietary business logic. Treat them accordingly.
- Self-hosting eliminates data residency and compliance risks. Your data stays in your VPC, your region, under your access controls.
- $0 vs $2,400/month. Self-hosted infrastructure costs a fraction of cloud observability pricing at scale. The cost difference grows as your agent fleet grows.
- Apache 2.0 means no restrictions. No open-core, no enterprise edition, no feature gating. Fork, modify, sell — your choice.
- Five minutes to deploy:
docker compose up -dgives you the full stack.langsight initgives you the CLI with zero infrastructure.
Related articles
- LangSight vs Langfuse — Both tools are self-hostable. See how they complement each other in a production observability stack.
- MCP Server Security — Security scanning and audit data should also stay in your network. How to run security audits self-hosted.
- How to Monitor MCP Servers in Production — Self-hosted MCP monitoring: zero external dependencies, full fleet visibility.
- AI Agent Cost Attribution — Self-hosted cost tracking saves you the $2,400/month cloud observability bill.
Keep your agent data in your network
LangSight is self-hosted, Apache 2.0, and free forever. Full observability for your agent fleet without sending a single trace to a third party.
Get started →