TL;DR
- Open protocol published by Anthropic in November 2024 under an MIT licence and now governed as a community project at modelcontextprotocol.io. Standardises how LLM applications connect to external context — tools, data, and prompts — so one server can serve every host.
- Built on JSON-RPC 2.0 with three transports: stdio (local subprocess, default for desktop hosts), Streamable HTTP (single endpoint, optional SSE upgrade — the 2025-03-26 replacement for the legacy HTTP+SSE pair), and SSE-only for backwards compatibility.
- Three core server primitives — Tools (functions the model can call), Resources (read-only context addressable by URI), Prompts (parameterised templates surfaced as slash commands) — and one client primitive, Sampling, that lets servers request LLM completions back through the host.
- Official SDKs for Python, TypeScript, Kotlin, C#, Java, Swift, Rust, Ruby and Go. Reference servers maintained by the spec project cover filesystem, git, GitHub, Postgres, SQLite, Slack, Google Drive, Brave Search, Puppeteer, Sentry, and more. Major vendors (Stripe, Linear, Notion, Cloudflare, AWS, Atlassian) ship signed first-party servers.
- Adopted by Claude Desktop, Claude Code, the Anthropic and OpenAI SDKs, Cursor, Zed, JetBrains AI Assistant, VS Code's GitHub Copilot Chat, and the OpenAI Agents SDK. By mid-2026 it is the de-facto wiring layer between models and the rest of an organisation's software estate.
Overview#
The Model Context Protocol (MCP) is an open JSON-RPC 2.0 protocol that standardises how LLM applications discover and invoke external context — tools, data resources, and prompt templates. Anthropic published the first draft and reference SDKs on 25 November 2024 alongside Claude Desktop's MCP client. The specification, the SDKs and the reference servers are all MIT-licensed and developed in the open at github.com/modelcontextprotocol; the project is now a community effort with contributors from Anthropic, OpenAI, Microsoft, Google, JetBrains, Block, Replit, Sourcegraph and Yobitel.
Before MCP, every LLM host re-implemented its own tool surface: LangChain `Tool` classes, OpenAI function-calling JSON, Claude `tool_use` blocks, AutoGen tools, custom HTTP-plus-JSON-schema endpoints. The same GitHub integration had to be written once per host, then maintained independently as each host evolved. MCP introduces a host-agnostic layer in between: write a GitHub MCP server once, and every MCP-capable host — Claude Desktop, Cursor, Zed, your own Python agent — gets the same capabilities with a few lines of configuration. It is the USB-C connector for the LLM stack.
The protocol is intentionally small. The 2025-06-18 spec revision (the version this entry tracks) defines around twenty method names, three transports, and a capability-negotiation handshake. Everything else — auth flows, schema validation, retry policy, observability — is delegated to host-side infrastructure or to standard internet plumbing (OAuth 2.1, JSON Schema, JSON-RPC error codes). The protocol does the integration glue and gets out of the way.
This entry covers the runtime surface a platform engineer needs to operate MCP at scale: how to install the SDKs and run a first server, how the wire format and capability negotiation actually work, the reference table of methods, the deployment patterns (local stdio, remote Streamable HTTP, gateway), sizing and latency considerations, limits and quotas, observability and tracing, cost, the OAuth 2.1 security model, how MCP compares to raw function-calling and legacy plugin systems, and the failure modes you will actually see in production. Yobibyte ships an MCP server template so customers can expose their workspace data to MCP-compatible clients (Claude Desktop, Cursor, Claude Code) without writing custom adapters — a concrete reason to understand the protocol if you are already on the Yobitel platform. This entry helps you connect an LLM client to your internal context sources using MCP — and recognise where Yobitel exposes MCP-compatible endpoints.
Quick start#
Below are three minimal walkthroughs: install the Python SDK and run an MCP server that exposes a single tool over stdio, install the TypeScript SDK and run an equivalent server, and connect Claude Desktop to both. Every snippet below is copy-and-runnable on a current Python 3.11+ or Node 20+ host.
Pattern A — Python server with a single tool. Pattern B — TypeScript server with the same tool. Pattern C — register both with Claude Desktop via the JSON config file.
# A. Python MCP server using the official SDK (FastMCP high-level API)
pip install "mcp[cli]>=1.6.0"
cat > weather_server.py <<'PY'
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("weather")
@mcp.tool()
def get_weather(city: str) -> str:
"""Return current weather for a city. Use only when the user names a city."""
return f"It is 18 degrees and overcast in {city}."
if __name__ == "__main__":
mcp.run() # stdio by default
PY
# Smoke test the server with the MCP CLI inspector
mcp dev weather_server.py
# B. Equivalent TypeScript server
npm init -y && npm install @modelcontextprotocol/sdk zod
cat > weather_server.ts <<'TS'
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
const server = new McpServer({ name: "weather", version: "0.1.0" });
server.tool(
"get_weather",
{ city: z.string().describe("City name") },
async ({ city }) => ({
content: [{ type: "text", text: `It is 18C and overcast in ${city}.` }],
}),
);
await server.connect(new StdioServerTransport());
TS
npx tsx weather_server.ts
# C. Register the Python server with Claude Desktop
# macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
# Linux: ~/.config/Claude/claude_desktop_config.json
# Windows: %APPDATA%\Claude\claude_desktop_config.json
cat > ~/.config/Claude/claude_desktop_config.json <<'JSON'
{
"mcpServers": {
"weather": {
"command": "python",
"args": ["/abs/path/to/weather_server.py"]
}
}
}
JSON
# Restart Claude Desktop; the hammer icon should now list "get_weather".The MCP CLI inspector (`mcp dev`) is the single best feedback loop for new server authors. It launches a local web UI that lists every method the server advertises and lets you invoke them with arbitrary arguments before you wire the server into a real host.
How it works#
MCP is a client-server protocol in the JSON-RPC 2.0 sense. A host process (Claude Desktop, an IDE, your application) instantiates one MCP client per server it wants to talk to. The client launches the server as a subprocess (stdio) or connects to it over HTTP, performs an initialisation handshake, and then dispatches typed requests and notifications over the chosen transport.
Initialisation is the only handshake. The client sends an `initialize` request declaring its protocol version and capabilities (sampling support, root-listing support, experimental features). The server responds with the version it speaks and its own capabilities — which of tools, resources, and prompts it exposes, and whether each supports change notifications. After the client sends the `notifications/initialized` notification, the connection is open for normal traffic.
Capability negotiation is one-shot. Capabilities cannot change at runtime; what was advertised at `initialize` is what the connection will support for its lifetime. The set of individual tools, resources, and prompts may change (servers emit `notifications/tools/list_changed` when that happens), but the protocol-level capability flags are fixed. This keeps client implementations simple — feature detection happens once.
Transports are independent of the protocol itself. The 2025-03-26 spec consolidated the three transports into: stdio (newline-delimited JSON over the subprocess's stdin/stdout — default for desktop hosts), Streamable HTTP (a single POST endpoint that the server may optionally upgrade to SSE for streaming — the recommended transport for remote servers), and the legacy HTTP+SSE pair (separate POST and SSE endpoints, kept only for backwards compatibility). The same JSON-RPC payloads travel over all three.
- Server primitives — Tools (model-invoked functions with JSON-schema-typed inputs and outputs), Resources (addressable read-only content identified by URI), and Prompts (parameterised templates the user invokes, typically as slash commands in the host UI).
- Client primitives — Sampling (the server can ask the host to run an LLM completion on its behalf, with user approval) and Roots (the client can advertise filesystem roots the server is allowed to traverse).
- Notifications — fire-and-forget messages in both directions: list_changed for tools/resources/prompts, progress for long-running calls, cancelled for client-cancelled requests, and logging/message for server-emitted log lines.
- Schemas — tools and prompts carry JSON Schema (draft 2020-12) for their inputs; tools optionally publish an output schema since 2025-06-18. Hosts validate before dispatching to the model.
Capabilities are negotiated, not assumed. A client that needs sampling must check the server advertised `sampling` support before calling it; a server that needs roots must check the client advertised `roots`. Mismatched assumptions cause silent feature loss, not loud errors.
Reference and specifications#
The table below is the canonical reference for every method and notification in the 2025-06-18 spec revision. Methods marked `S->C` are server-to-client; the rest are client-to-server. Notifications carry no response. All methods accept the standard JSON-RPC envelope and follow the standard `-32000`-range error code conventions for protocol-defined errors.
| Method | Direction | Purpose | Capability gated by |
|---|---|---|---|
| initialize | C->S | Negotiate protocol version and capability sets at session start. | Always |
| notifications/initialized | C->S | Client signals handshake complete; server may begin sending notifications. | Always |
| ping | Either | Keepalive; expects empty result. | Always |
| tools/list | C->S | Enumerate the tools the server exposes. | tools |
| tools/call | C->S | Invoke a named tool with JSON arguments. | tools |
| notifications/tools/list_changed | S->C | Tool catalogue changed; client should refresh. | tools.listChanged |
| resources/list | C->S | Enumerate addressable resources. | resources |
| resources/read | C->S | Fetch the content of a resource by URI. | resources |
| resources/subscribe | C->S | Watch a resource for updates. | resources.subscribe |
| resources/unsubscribe | C->S | Stop watching a resource. | resources.subscribe |
| resources/templates/list | C->S | Enumerate URI templates (RFC 6570) for parameterised resources. | resources |
| notifications/resources/updated | S->C | Subscribed resource content changed. | resources.subscribe |
| notifications/resources/list_changed | S->C | Resource catalogue changed. | resources.listChanged |
| prompts/list | C->S | Enumerate prompt templates the server publishes. | prompts |
| prompts/get | C->S | Render a named prompt with arguments. | prompts |
| notifications/prompts/list_changed | S->C | Prompt catalogue changed. | prompts.listChanged |
| sampling/createMessage | S->C | Server asks the host to run an LLM completion (with user approval). | client.sampling |
| roots/list | S->C | Server asks for the filesystem roots it is allowed to traverse. | client.roots |
| notifications/roots/list_changed | C->S | Client's root list changed. | client.roots.listChanged |
| completion/complete | C->S | Argument auto-complete for prompts and resources (IDE-style). | completions |
| logging/setLevel | C->S | Set the minimum log level the server emits. | logging |
| notifications/message | S->C | Server-emitted log line. | logging |
| notifications/progress | Either | Progress update for a long-running operation, keyed by progressToken. | Always (opt-in via _meta) |
| notifications/cancelled | Either | Cancellation of an in-flight request. | Always |
The protocol number to send in the `initialize` request is a date string. As of the 2025-06-18 revision, send `"2025-06-18"`. The server replies with the highest version it supports that the client also supports; if there is no overlap, the client must close the connection.
Workload patterns#
Three deployment shapes cover the bulk of production MCP usage: (A) Local subprocess servers consumed by a desktop host, (B) Remote HTTP servers consumed by a hosted LLM application or a cloud-deployed agent, (C) Gateway-aggregated multi-server topologies where one fan-out endpoint exposes tools from many backends. Each has its own preferred transport, auth, and scaling shape.
Pattern A — Local stdio. A desktop IDE or Claude Desktop spawns a Python or Node subprocess per server. Auth piggybacks on the operating-system user; latency is sub-millisecond; scaling is per-user. Best for filesystem, git, local databases, and integrations that wrap the user's own credentials. Pattern B — Remote Streamable HTTP. A hosted application opens a single POST endpoint per server; the server uses OAuth 2.1 with PKCE for user-delegated access. Best for SaaS integrations (GitHub, Linear, Stripe) and multi-tenant deployments. Pattern C — Gateway. A control-plane MCP server fronts many backends, presenting a single tool catalogue to the host while routing calls to the right backend. Best for enterprises that want centralised auth, rate limiting, and audit logging across dozens of integrations.
# Pattern B — remote Streamable HTTP server in FastAPI-style FastMCP
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("issues", host="0.0.0.0", port=8080)
@mcp.tool()
def search_issues(query: str, repo: str) -> list[dict]:
"""Search GitHub issues. Use when the user asks about open work in a repo."""
return github_client.search_issues(repo, query)
@mcp.resource("issue://{repo}/{number}")
def get_issue(repo: str, number: int) -> str:
"""Fetch a single issue by number."""
return github_client.get_issue(repo, number).body
if __name__ == "__main__":
mcp.run(transport="streamable-http")
# Serve behind a reverse proxy terminating TLS; require OAuth 2.1 bearer
# tokens at the proxy before forwarding to the MCP endpoint.
# Pattern C — gateway that fans out to multiple backend MCP servers
# (sketch using the official SDK's client + server composition primitives)
from mcp.client.stdio import stdio_client, StdioServerParameters
from mcp.server.fastmcp import FastMCP
gateway = FastMCP("ml-platform-gateway")
BACKENDS = {
"git": StdioServerParameters(command="mcp-server-git"),
"fs": StdioServerParameters(command="mcp-server-filesystem", args=["/workspace"]),
"pg": StdioServerParameters(command="mcp-server-postgres",
env={"DATABASE_URL": "postgres://..."}),
}
@gateway.tool()
async def call(backend: str, tool: str, args: dict) -> dict:
"""Dispatch a tool call to a named backend MCP server."""
async with stdio_client(BACKENDS[backend]) as (read, write):
from mcp import ClientSession
async with ClientSession(read, write) as session:
await session.initialize()
result = await session.call_tool(tool, args)
return {"content": [c.model_dump() for c in result.content]}
if __name__ == "__main__":
gateway.run(transport="streamable-http")Pattern C (gateway) is the production shape for enterprises. It lets the security team enforce auth, audit and rate limiting centrally while letting individual product teams ship their own backend MCP servers without each one re-solving the auth problem.
Sizing and capacity planning#
MCP itself is cheap: JSON-RPC over stdio or a single HTTP POST per call, payloads measured in kilobytes. The cost lives in the work the server performs — a database query, a GitHub API call, a vector search. Size for the underlying backend, not for MCP. The protocol overhead is negligible on every plausible workload.
The numbers that do matter are: (1) call latency budget, (2) concurrent sessions per server process, (3) memory per session for SSE-streaming connections, (4) connection-establishment overhead. The table below captures realistic ranges from production MCP deployments running on commodity hardware (4 vCPU / 8 GB containers fronted by a TLS-terminating proxy).
| Dimension | stdio (local) | Streamable HTTP (remote) | Notes |
|---|---|---|---|
| Tool-call round-trip overhead | 0.5-2 ms | 5-30 ms LAN, 30-150 ms WAN | Excludes backend work. |
| Connection setup | 50-200 ms (subprocess spawn) | 10-50 ms (TLS + initialize) | Pool HTTP sessions; never re-spawn stdio per call. |
| Concurrent sessions per process | 1 (one client) | 200-2000 | Async I/O bound, not CPU bound. |
| RAM per idle session | 5-20 MB (Python proc) | 20-100 KB | stdio servers are one process per client. |
| RAM per streaming session | n/a | 100-500 KB | SSE keeps the response open. |
| Payload size (typical) | 1-10 KB request, 1-50 KB response | Same | Larger if tools return file content. |
| Payload size (worst case) | Limited by JSON parser only | Cap at proxy (e.g. 10 MB) | Use resources for big blobs. |
Limits and quotas#
MCP places no protocol-level limits on payload size, request rate, or session count. Every limit you hit will be one you configured at the host, the proxy, or the backend. The list below covers the limits you must plan for in production.
- Tool catalogue size — every host imposes a soft limit on how many tools it will surface to the model at once. Claude Desktop and Claude Code currently warn above 100 tools per session; model accuracy degrades materially beyond around 30-50 tools regardless of host.
- Tool name uniqueness — names must be globally unique within a host session. Servers should namespace (`github_search_issues`, not `search_issues`) when they will coexist with other servers in the same client.
- Schema depth — most hosts truncate deeply nested JSON Schemas (more than 6-8 levels) when serialising to the model. Keep input schemas flat.
- Resource URI length — practical cap of 8 KB to fit common HTTP header limits when resources are surfaced through proxies; far below the JSON-RPC theoretical maximum.
- Streamable HTTP keepalive — proxies (Cloudflare, AWS ALB, Nginx default) close idle connections at 30-300 s. Configure server-sent pings every 15-20 s or accept reconnects.
- Subprocess fan-out — stdio hosts spawn one process per server per user session. A workstation with 50 MCP servers configured and a 500 MB-resident model on each is unworkable. Keep stdio servers lightweight; promote heavy services to HTTP.
- JSON-RPC request ID space — practically unbounded (string or number), but high-throughput clients should use monotonically increasing integers and not assume server-side uniqueness across reconnections.
If a server exposes more than 30 tools you have a design problem. Split it: one server per coherent feature area, namespaced names, and let the host's tool-filtering UI (or a gateway router) present only the relevant subset per session.
Observability#
MCP gives you two observability hooks at the protocol level: the `logging` capability lets the server emit structured log lines that the client can route to its own logging pipeline, and `notifications/progress` lets long-running tool calls stream progress updates keyed by a `progressToken`. Everything else — distributed tracing, metrics, request-level audit — sits at the host or the gateway layer.
For production deployments at scale, the workable pattern is: instrument the server's JSON-RPC dispatch layer with OpenTelemetry, propagate trace context through the `_meta` field of incoming requests, emit a span per `tools/call` annotated with the tool name and the user identifier from the OAuth token, and ship the resulting traces to whatever observability backend the host platform standardises on. The MCP SDKs do not ship OpenTelemetry middleware out of the box; the few-dozen-line wrapper is straightforward.
- Per-call attributes worth recording: tool name, server name, JSON-RPC method, request id, user id from OAuth, latency, payload size in/out, error class, retry count.
- Per-session attributes: protocol version, client name and version, server name and version, capabilities negotiated, transport.
- Notifications/progress is the right place to stream long-running work — model UIs (Claude Desktop, Cursor) render the progress messages directly to the user, which is dramatically better UX than a spinner with no signal.
- Server-side `logging/message` notifications should carry structured fields, not pre-formatted strings — hosts that route them to JSON-log pipelines can filter and aggregate properly.
# OpenTelemetry middleware for FastMCP tool calls
from contextvars import ContextVar
from opentelemetry import trace
from mcp.server.fastmcp import FastMCP
tracer = trace.get_tracer("mcp.server.weather")
mcp = FastMCP("weather")
current_user: ContextVar[str | None] = ContextVar("current_user", default=None)
def traced_tool(name: str):
def wrap(fn):
async def inner(**kwargs):
with tracer.start_as_current_span(f"mcp.tool.{name}") as span:
span.set_attribute("mcp.tool.name", name)
span.set_attribute("mcp.user", current_user.get() or "anon")
span.set_attribute("mcp.args.size", len(str(kwargs)))
try:
return await fn(**kwargs)
except Exception as e:
span.record_exception(e)
span.set_status(trace.Status(trace.StatusCode.ERROR))
raise
inner.__name__ = fn.__name__
return mcp.tool(name=name)(inner)
return wrap
@traced_tool("get_weather")
async def get_weather(city: str) -> str:
return f"It is 18C and overcast in {city}."Cost and FinOps#
MCP as a protocol is free (MIT-licensed, no vendor surcharge). The cost lives in three places: the LLM tokens consumed by tool definitions and tool results, the compute and API quota consumed by the backend that the server fronts, and the operational cost of running the server itself.
Token cost is the one most teams underestimate. Every active MCP session pays for the tool catalogue on every model turn — definitions, names, descriptions, JSON Schemas are part of the request. A 30-tool server with verbose descriptions can add 4,000-8,000 input tokens per turn. At Claude Sonnet 4.5 list prices (around $3 per million input tokens, $15 per million output), a single chat session with heavy tool use can cost $0.02-$0.05 in just the catalogue overhead. Prompt caching (where supported by the host) recovers most of that — cached catalogues bill at roughly 10 percent of standard input cost.
Operational cost depends on the deployment shape. Local stdio servers cost nothing beyond the workstation. Remote HTTP servers on commodity infrastructure cost $20-200/month for a small organisation (two replicas, t3.medium-equivalent, Cloudflare or AWS ALB in front). Heavy servers — those that maintain background workers, large in-process indexes, or stateful long-running connections — are the same cost as any other backend service: size to the underlying workload, not to MCP.
| Cost component | Typical USD range | Driver |
|---|---|---|
| Protocol licence | $0 | MIT, no surcharge. |
| Tool catalogue overhead | $0.01-0.05 per chat session | Token cost of definitions sent every turn. |
| Cached catalogue | ~10% of uncached | Prompt caching where the host supports it. |
| Remote server compute (small) | $20-200/month | 2x small containers + TLS proxy. |
| Remote server compute (busy) | $500-5,000/month | Scales with backend workload, not MCP itself. |
| Gateway tier (enterprise) | $2,000-15,000/month | Centralised auth, audit, rate limiting across many backends. |
Security and compliance#
MCP servers run with the user's authority and touch the user's data. The protocol explicitly delegates auth, consent and audit to the host and the deployment environment, which means insecure defaults are easy to ship if you do not apply discipline. The list below is the working security checklist that matches the 2025-06-18 spec guidance plus what production deployments have learned the hard way.
- Remote transports MUST use TLS. Streamable HTTP without TLS is a credentials-in-the-clear leak; refuse to deploy.
- Remote transports SHOULD use OAuth 2.1 with PKCE for user-delegated access. The 2025-03-26 spec revision standardised the discovery flow; vendor-issued tokens are no longer acceptable for new servers.
- Stdio servers inherit the user's OS authority. Treat them like browser extensions — audit before installing, prefer signed or vendor-published servers, sandbox if possible.
- Never accept tool input that names a filesystem path without rooting it in a declared root. Use the Roots capability and refuse paths outside the advertised roots.
- Hosts MUST surface consent UI for tool invocation. Auto-approval for destructive operations (write, delete, send) is a security incident waiting to happen — require per-tool, per-invocation approval for anything with side effects.
- Servers that call back to the model via `sampling/createMessage` must declare it; users must be able to refuse. This is the protocol surface most prone to abuse — a malicious server can otherwise exhaust the user's model spend.
- Log every tool invocation server-side with user, tool, arguments hash, result hash, latency, error. This is the audit trail your security team will ask for the first time something goes wrong.
- For UK Sovereign deployments, terminate Streamable HTTP inside the regulated boundary, do not let server logs egress to overseas observability platforms without contractual cover, and align consent UI guidance with NCSC Cloud Security Principle 5 (governance) and Principle 9 (secure user management).
MCP shifts the trust model. A user who installs five MCP servers has granted five new pieces of software access to their data and credentials. Treat server installation as a security event — review what each server does, audit its source if it is community-published, and prefer signed first-party servers from the vendors you already trust.
Migration and alternatives#
MCP is not the first attempt at standardising LLM-tool integration. Compared with raw function-calling per provider, with the OpenAPI-based ChatGPT plugin manifest of 2023-2024, with LangChain's `Tool` class, and with bespoke HTTP+JSON wiring, MCP is the only option that gets adopted across hosts. The table below captures the comparison; the practical migration paths follow.
- From per-provider function-calling — swap the tool dispatch layer for an MCP client; tool definitions move from inline JSON in your request to MCP server `tools/list` responses. The provider-side function-calling JSON the model emits is unchanged.
- From LangChain Tools — most LangChain tools can be hosted behind an MCP server in a few lines using FastMCP; the agent itself can then use the LangChain MCP adapter (`langchain-mcp-adapters`) to consume the same tools from an MCP server. This decouples tool implementation from the agent framework choice.
- From the OpenAPI plugin manifest — the conversion is mostly mechanical: each OpenAPI operation becomes an MCP tool, parameters become the tool input schema, response schemas become the output schema. The auth pivot from API-key headers to OAuth 2.1 is the substantive piece of work.
- From bespoke REST — wrap each REST endpoint as an MCP tool, keep the underlying backend unchanged, and gain discovery, streaming and capability negotiation for free.
| Approach | Surface | Strengths | Weaknesses |
|---|---|---|---|
| Raw function-calling per provider | Per-vendor JSON in request body | No extra dependency; minimal latency. | Re-implement per provider; per-host wiring; no discovery. |
| OpenAPI plugin manifest (ChatGPT 2023-24) | OpenAPI 3 spec + ai-plugin.json | Reused existing OpenAPI tooling. | Vendor-specific; deprecated by OpenAI in 2024; no notifications or streaming. |
| LangChain Tool wrapper | Python class with `_run`/`_arun` | Huge ecosystem; tight LCEL integration. | Locked to LangChain; no cross-host reuse; per-process only. |
| Bespoke HTTP + JSON-schema | Custom REST endpoints | Total control. | Reinvents discovery, auth, schema validation in every project. |
| Model Context Protocol (MCP) | JSON-RPC 2.0 + capability negotiation | Cross-host; standard auth; streaming and notifications; growing SDK and server library. | Newer (Nov 2024); requires capability discipline; auth still implementer-built. |
Troubleshooting#
The failure modes below are the ones MCP server authors and operators hit repeatedly. Each row of the table includes the symptom, the cause that explains 80 percent of cases, and the fix.
| Symptom | Likely cause | Fix |
|---|---|---|
| Server starts but client sees no tools | Client expected JSON-RPC over stdio but server printed log lines to stdout, corrupting the framing. | Route all logs to stderr; keep stdout exclusively for JSON-RPC frames. |
| initialize fails with version error | Protocol version mismatch — client sent a version the server does not support. | Server should reply with its highest supported version; client must accept that or close. |
| Tool call returns InvalidParams (-32602) | Argument JSON does not validate against the tool's input schema. | Tighten descriptions, add enums and patterns, run the host's inspector to see the model's actual arguments. |
| Streamable HTTP disconnects every 30-60 s | Reverse proxy closing idle connections. | Server-side keepalive ping every 15-20 s, or accept SSE reconnects with `Last-Event-Id`. |
| High first-call latency from desktop host | Subprocess spawn cost on stdio servers. | Slim the server's import graph; defer heavy dependencies until first use; consider switching to Streamable HTTP if the server is heavy. |
| OAuth flow loops forever | Client redirected to authorize URL; callback never reaches the server. | Verify the PKCE code_challenge_method is `S256`, confirm the redirect_uri matches the registered client exactly, check for proxy-stripped headers. |
| Sampling requests fail silently | Client did not advertise `sampling` capability; server's request is being dropped. | Server must check the client capability set before calling `sampling/createMessage`; degrade gracefully when absent. |
| Tools/list_changed never fires | Server did not declare `tools.listChanged: true` in its capabilities. | Set the capability flag at initialise time; otherwise hosts will not subscribe to the notification. |
| Tool descriptions look truncated to the model | Host truncates very long schemas; deeply nested objects collapsed. | Keep schemas flat (max 4-5 levels); move long help text into the `description` field, not into property names. |
| Server runs out of memory under load | stdio host spawned one server process per user; resident set scales linearly. | Migrate the server to Streamable HTTP transport and front with a connection-pooling proxy. |
When in doubt, attach the MCP CLI inspector (`mcp dev`) — its trace pane shows the raw JSON-RPC frames in both directions, which is the fastest way to spot stdio framing bugs, schema-validation rejections, and capability-negotiation mistakes.
Where this fits in the Yobitel stack#
Yobitel treats MCP as the default integration surface between Yobibyte-deployed models and the rest of a customer's software estate. Internally, our gateway pattern (a control-plane MCP server fronting backend integrations) is how we expose private data sources to inference workloads without each model needing a bespoke connector. Externally, every model deployed on Yobibyte or Omniscient Compute that supports tool use can be wired to MCP servers through standard Anthropic or OpenAI SDK client configuration — no Yobitel-specific code path.
For UK Sovereign customers, our recommended pattern is a per-tenant gateway terminating Streamable HTTP inside the regulated boundary, with OAuth 2.1 bound to the customer's identity provider and all server-side audit logs routed to the tenant's own SIEM. Aligns cleanly with NCSC Cloud Security Principles 5 and 9 and keeps the integration surface auditable without locking the customer into Yobitel-specific tooling.
References
- Model Context Protocol Specification · modelcontextprotocol.io
- MCP Specification Source · GitHub
- Reference Servers · GitHub
- Python SDK · GitHub
- TypeScript SDK · GitHub
- Anthropic announcement (Nov 2024) · Anthropic