Diagnose EKS Node Issues Faster with AWS DevOps Agent and Custom MCP
When your Kubernetes cluster starts throwing CrashLoopBackOff errors at 3 AM, you don’t want to manually SSH into nodes, grep through logs, and cross-reference timestamps with CloudWatch metrics. AWS DevOps Agent automates exactly this kind of troubleshooting by investigating production incidents autonomously. It can diagnose pod failures, trace configuration changes through AWS CloudTrail audit logs, and correlate metrics with cluster events—all without waking up your on-call engineer. But here’s the catch: it only works well when your troubleshooting data lives in AWS services it knows about. When critical diagnostics are scattered across custom monitoring tools, proprietary observability platforms, or internal systems, DevOps Agent hits a wall.
That’s where custom Model Context Protocol (MCP) servers come in. MCP is an open standard that lets you extend Claude’s capabilities by plugging in external data sources and tools. By building a custom MCP server for your specific infrastructure, you can teach AWS DevOps Agent where to find information it doesn’t natively know about. Imagine your team uses Grafana for custom metrics, runs internal diagnostic scripts, or stores cluster metadata in a homegrown database. A custom MCP acts as a translator—when DevOps Agent needs to investigate a node issue, it can query these sources directly through the MCP interface. The agent receives structured data back, understands the context, and continues its diagnosis autonomously. Technically, this means writing a simple HTTP server that exposes your data sources as callable tools, then registering it with the DevOps Agent framework.
In practice, this matters because production diagnostics are rarely contained in a single service. Consider a real scenario: a node runs out of disk space, causing several pods to evict. Native DevOps Agent can see the eviction events in EKS and spot the low disk metric in CloudWatch. But the root cause might be a runaway log file from a custom application that writes to a local volume—something only visible to your internal log analysis system or a kubectl exec command. With a custom MCP, you can wrap these diagnostic commands into callable tools that DevOps Agent triggers automatically. It correlates the eviction event with the log growth, narrows down the problematic deployment, and suggests remediation steps. This moves diagnosis from “we need access to three different tools” to “ask the agent.”
The practical benefit is speed and reliability. Your team spends less time context-switching between dashboards during incidents. DevOps Agent gathers evidence systematically, reduces false leads, and provides richer context in its findings. For teams running complex, heterogeneous infrastructure—especially those with custom tooling or legacy systems that can’t be easily migrated—custom MCP servers turn DevOps Agent from a helpful assistant into a genuinely autonomous troubleshooter. It’s not about replacing human expertise; it’s about automating the tedious data-gathering phase so your engineers can focus on decisions that actually require judgment.