TL;DR
What you will learn: The internal architecture of MCP servers, from transport layers to capability registration to the JSON-RPC message protocol that ties it all together.
Why it matters: Understanding server internals is the difference between a demo that works and a production system that scales. Transport choice, capability negotiation, and error handling all affect reliability.
Exo Edge: We have built production MCP servers across multiple domains and transports. The architecture patterns and pitfalls in this post come from real deployments, not documentation.
Your AI Can Only Do What Its Connections Allow
Model Context Protocol (MCP) development starts with understanding how servers actually work under the hood. If you have read the introductory material and are ready to build, this is where the architecture becomes concrete. MCP servers are the backbone of the protocol: they expose tools, resources, and prompts to AI clients through a well-defined lifecycle of initialization, capability negotiation, and message exchange.
Problem / pain: Most MCP tutorials stop at "hello world." Teams get stuck when they need to handle transport selection, capability negotiation, error flows, and production architecture patterns.
Opportunity: A complete understanding of server architecture lets you make the right decisions before writing code: which transport, how to register capabilities, how to handle errors, and which architecture pattern fits your deployment.
Credibility: Exo builds production MCP servers for technical teams across AI, blockchain, and enterprise infrastructure. The patterns here come from real deployments.
Section 1: What Is MCP Server Architecture? Why Does It Matter?
Definition: An MCP server is the component that exposes tools, resources, and prompts to AI clients through a standardized lifecycle. It handles transport setup, initialization handshake, capability declaration, and request processing over JSON-RPC 2.0.
Problem it solves: Without understanding the lifecycle and message protocol, teams build servers that work in demos but fail in production. Transport choice, capability registration, and error handling are architectural decisions that compound.
Exo Insight: The server lifecycle is deceptively simple (four phases) but the details matter. Most production issues trace back to initialization handling, transport configuration, or tool description quality.
Section 2: How Does MCP Server Architecture Work?
The Server Lifecycle
Every MCP server follows the same four-phase lifecycle, regardless of what it connects to:
Phase 1: Transport Setup. The server starts listening on its configured transport. For stdio, this means reading from stdin and writing to stdout. For HTTP + SSE, this means binding to a port and waiting for connections. The transport is just the pipe; the protocol runs on top of it.
Phase 2: Initialization. When a client connects, it sends an initialize request containing its protocol version and capabilities. The server responds with its own protocol version, server info, and a capabilities object that declares what the server supports: tools, resources, prompts, logging, or any combination.
Phase 3: Initialized Notification. After receiving the server's initialize response, the client sends an initialized notification. This signals that the handshake is complete and the server can begin accepting requests. No work happens before this notification.
Phase 4: Active Session. The server processes requests, sends responses, and can push notifications to the client. This continues until the client disconnects or sends a shutdown request.
The Message Protocol: JSON-RPC 2.0
MCP uses JSON-RPC 2.0 as its wire format. Every message between client and server is one of three types:
Requests carry an id, a method name, and optional params. The client sends requests like tools/call or resources/read. The server must respond with a result or error matching the same id.
Responses carry the matching id and either a result object (on success) or an error object (on failure) with a code, message, and optional data field.
Notifications are one-way messages with no id and no expected response. Both client and server can send notifications. Standard JSON-RPC, which means existing tooling for debugging, logging, and testing works out of the box.
Transport Layer Deep Dive
stdio Transport
The stdio transport runs the MCP server as a child process of the host application. Communication flows over stdin (client to server) and stdout (server to client). Messages are newline-delimited JSON. No network stack, no TLS, no port management. The host spawns the server process, pipes JSON messages in, and reads JSON messages out.
Use stdio when: the server runs on the same machine as the client, you need zero-latency communication, or you are building CLI tools and local development integrations. Important constraint: anything the server writes to stdout is treated as protocol messages. Always redirect logging to stderr.
Streamable HTTP Transport
The Streamable HTTP transport (which supersedes the earlier SSE-only transport) runs the server as a standalone HTTP service. Clients send JSON-RPC requests as HTTP POST requests to a single endpoint (typically /mcp). The server responds with either a direct JSON response or opens a Server-Sent Events stream for long-running operations. This transport enables production patterns that stdio cannot support:
Multiple clients connecting to a single server instance.
Server deployment on separate infrastructure (containers, VMs, serverless).
Network-level security controls (TLS, load balancers, API gateways, firewalls).
Session management with stateful sessions across multiple HTTP requests.
Capability Registration
The capabilities object in the initialize response is the contract between server and client. Tool registration requires a name, a description (which the AI model reads to decide when to use the tool), and an input schema in JSON Schema format. Resource registration requires a URI, name, description, and MIME type. Prompt registration requires a name, description, and optional argument list.
Request Handling: How Tool Calls Flow
When the AI model invokes a tool: (1) The model generates a tool call with name and arguments. (2) The host routes it to the appropriate MCP client. (3) The client sends a tools/call JSON-RPC request. (4) The server validates input against JSON Schema. (5) The server executes the handler. (6) The server returns content blocks (text, images, or embedded resources). (7) The client passes results back to the model. For long-running tools, the server streams progress notifications. Error handling follows JSON-RPC conventions with tool-level errors (isError: true) distinct from protocol-level errors.
Section 3: Comparisons and Tradeoffs
vs. Single-Purpose Servers
What is it? Each server wraps one system or domain. A database server, a deployment server, a monitoring server. Unix philosophy: do one thing well.
Benefits: Small, testable, independently deployable. The host connects to multiple servers, each handling its own domain.
vs. Gateway Servers
What is it? A single server acts as a proxy to multiple backend services with a unified tool interface.
Tradeoff: Reduces connection count but creates a broader failure domain. Good for teams with many small services.
vs. Layered Servers
What is it? Base server provides auth, logging, rate limiting. Plugin servers add domain-specific tools. Works well for enterprise where infrastructure teams manage the base and application teams manage plugins.
Most teams start with single-purpose servers and consolidate as they understand their access patterns.
Section 4: Playbook (Avoiding Common Pitfalls)
This is where we stand out. After building MCP servers across multiple domains, these are the mistakes we see most often.
Poor tool descriptions. The AI model reads the description to decide whether to call a tool. "Fetches data" is useless. "Queries the PostgreSQL database and returns rows matching the provided SQL WHERE clause" tells the model exactly when and how to use it.
Too many tools on one server. If your server exposes 50 tools, the AI evaluates all 50 descriptions for every request. This increases latency and reduces accuracy. Keep servers focused: 5-15 tools per server.
Missing input validation. JSON Schema on tool inputs is your first line of defense. Validate types, ranges, and required fields before execution. The AI will sometimes generate invalid inputs.
Blocking the event loop. MCP servers are typically async. A synchronous database query or HTTP call in a tool handler blocks all other requests. Use async operations throughout.
Logging to stdout. In stdio transport, stdout is reserved for protocol messages. Use stderr for logs, or use a structured logging library that writes to files.
Questions to ask before building: Which transport do we need (local vs remote)? How many tools per server? What error handling pattern fits our domain? Do we need progress notifications for long-running tools?
Conclusion + Action
Recap: MCP server architecture follows a clean lifecycle (transport, init, initialized, active) with JSON-RPC 2.0 as the wire format. Transport choice (stdio vs HTTP) determines deployment model. Capability registration is the contract between server and client. Architecture patterns (single-purpose, gateway, layered) scale differently.
Next step for reader: Read Blog #4 (Build a Custom MCP Server in TypeScript) for a hands-on implementation of everything covered here. Or jump to Blog #6 (MCP Security) if you are focused on production hardening.
Resources:
MCP Specification:
JSON-RPC 2.0 Spec:
Exo Technologies:
Exo builds production MCP servers for technical teams across AI, blockchain, and enterprise infrastructure. We handle the architecture decisions, transport configuration, and production hardening so you can focus on what your server connects to. Ready to build? Reach out at founders@exotechnologies.xyz
