News
Proving application resilience on Azure with Chaos Studio
Production outages are inevitable. Network latency spikes, database connections fail, entire availability zones go down—and when they do, your application either handles it gracefully or it doesn’t. Most teams don’t know which until it’s too late. Azure Chaos Studio addresses this by letting you deliberately break things in a controlled way. It’s essentially a testing framework that simulates infrastructure failures before they happen for real, giving you confidence that your application can actually recover from the disasters you’ve designed it to handle.
Read more →Claude in Microsoft Foundry is now generally available
Microsoft has made Claude, Anthropic’s AI model, generally available through Azure’s AI Foundry platform. This means teams can now access Claude at production scale without building custom infrastructure, with the models running on NVIDIA’s latest Blackwell Ultra GPUs. If you’ve been experimenting with Claude through Anthropic’s API or other providers, this announcement matters because it gives you another deployment path—one that’s tightly integrated with Azure’s broader AI stack and governance tools.
Read more →How GitHub used secret scanning to reach inbox zero
GitHub recently shared how they tackled a sprawling security problem: 20,000+ secret scanning alerts across 15,000 repositories. The challenge wasn’t just the volume—it was figuring out which alerts actually mattered versus false positives that waste engineering time. Their solution offers a practical playbook for any organization drowning in security noise, and the approach reveals something important about how modern development security actually works at scale.
Secret scanning automatically detects credentials, API keys, and tokens that accidentally get committed to repositories. GitHub’s scanner runs on every push, looking for patterns that match known secret formats—AWS access keys, GitHub tokens, Slack webhooks, and hundreds of other credential types. When the system flags something, it generates an alert. This is powerful: you catch leaks before they leave your codebase. But here’s the problem GitHub faced: when you have thousands of repositories and years of accumulated commits, you also get thousands of alerts, many from old repositories, inactive projects, or credentials that were already rotated months ago. Engineers start ignoring alerts because reviewing them feels like busy work. That’s when your security tool stops being useful.
Read more →Upgrade Amazon EKS clusters with confidence using Kubernetes version rollbacks
Kubernetes upgrades have traditionally been a nail-biting experience. You schedule maintenance, cross your fingers, and hope nothing breaks in production. AWS just made that process significantly less stressful by introducing Kubernetes version rollbacks for Amazon EKS—a feature that lets you undo a cluster upgrade within seven days if things go wrong. This transforms upgrades from a one-way door into a reversible operation, which is a meaningful shift in how teams approach cluster maintenance.
Read more →Ship infrastructure faster with CloudFormation and CDK pre-deployment validation on every stack operation
Infrastructure as Code (IaC) has become standard practice, but the feedback loop between writing a template and discovering errors can still be painfully slow. AWS CloudFormation lets you define cloud resources as code using JSON or YAML, while CDK takes it further by letting you write infrastructure in Python, TypeScript, or other languages. The problem? A single syntax error, missing property, or invalid parameter type can derail your deployment—whether you’re deploying directly, using change sets for previews, or running automated deployments through CI/CD pipelines or AI agents. AWS has now extended CloudFormation’s validation capabilities to catch these issues earlier in the development process, before they consume deployment time and developer attention.
Read more →Accelerate your infrastructure deployments by up to 4x with AWS CloudFormation Express mode
AWS CloudFormation just got faster. The new Express mode can cut deployment times down to seconds instead of minutes, which might not sound like much until you’re iterating on infrastructure changes dozens of times a day. Whether you’re building AI applications that need rapid experimentation or managing DevOps workflows that demand quick feedback loops, this feature addresses a real pain point: waiting for CloudFormation stacks to create or update before you can validate your changes.
Read more →Previewing GPT-5.6 Sol: a next-generation model
OpenAI has announced GPT-5.6 Sol, a new large language model that represents a meaningful step forward in AI capabilities, particularly for technical domains. If you’re working with AI in production environments, this preview gives us a window into what’s coming and what you should be thinking about now.
So what makes Sol different? The model shows substantial improvements in three areas that directly impact cloud and automation work: coding, scientific reasoning, and cybersecurity analysis. When OpenAI says “stronger capabilities in coding,” they’re not just talking about generating boilerplate—Sol appears to handle complex multi-step problems, debugging logic, and architectural decisions with better accuracy than its predecessors. For those of us writing infrastructure-as-code, Lambda functions, or automation scripts, this means better code generation assistance and fewer hallucinations when asking the model to help with tricky logic. The science and cybersecurity improvements matter too, especially if you’re working with security scanning, threat analysis, or using AI to help parse technical documentation and research papers.
Read more →How to Generate an SBOM for Container Workflows
If you’re deploying containers at any scale, you’ve probably encountered security audits asking for a complete list of what’s inside your images. That’s where SBOMs come in. A Software Bill of Materials (SBOM) is essentially an inventory of all dependencies, libraries, and packages bundled into your container image. Think of it like a nutrition label for your software—it tells you exactly what ingredients are present. As container security becomes a core requirement for compliance frameworks like SLSA and regulations such as the Executive Order on Cybersecurity, understanding how to generate and integrate SBOMs into your CI/CD pipeline is becoming a table-stakes skill for DevOps and platform engineers.
Read more →Evaluating performance and efficiency of the GitHub Copilot agentic harness across models and tasks
GitHub recently published findings on their Copilot agentic harness—a framework designed to run AI agents across different models while measuring performance and efficiency. If you’re building AI-assisted workflows or considering which models to use for your development tasks, this is worth understanding. The research essentially answers a practical question many teams face: which combination of model and task setup gives you the best results without burning through your token budget?
Read more →Spotlight on WG Device Management
Kubernetes has become the go-to orchestration platform for cloud workloads, but it was originally designed for stateless applications that only needed CPU and memory. Today, that’s changing rapidly. As AI models, edge computing, and telecommunications services move to Kubernetes, operators face a new challenge: how do you allocate and manage specialized hardware like GPUs, TPUs, and network interface cards (NICs)? This is where the Kubernetes Device Management working group steps in, developing standards for hardware resource allocation that go far beyond traditional CPU and memory constraints.
Read more →From insight to action: The next phase of agentic cloud operations
Cloud operations have traditionally worked in cycles: you monitor your environment, get alerts, analyze what’s wrong, and then manually decide what to do next. But what if that entire decision-making loop could happen automatically? Microsoft’s vision of agentic cloud operations moves beyond dashboards and alerts to create systems that don’t just tell you what’s broken—they fix it themselves. This represents a meaningful shift in how we approach cloud management, turning cloud platforms from reactive tools into proactive decision-makers.
Read more →I automated my job (and it made me a better leader)
There’s a persistent myth in tech leadership that automation is something you delegate to junior engineers. But what if the best person to automate your own workflow is you? A senior leader at GitHub recently shared how they implemented 40 automations across their daily tasks—and the outcome wasn’t burnout prevention, though that was a nice side effect. Instead, it fundamentally changed how they lead their team.
The technical foundation here is straightforward but powerful. These automations likely combine GitHub Actions workflows, API integrations, and cloud-based task runners to eliminate repetitive manual work: automatically summarizing pull requests, routing code reviews to the right people, aggregating metrics from multiple dashboards, and flagging blockers before they become problems. The magic isn’t in any single tool—it’s in the workflow orchestration. When a pull request lands, a workflow can trigger status checks, post summaries to Slack, update tracking systems, and notify stakeholders without human intervention. For Python-savvy engineers, this might mean writing Lambda functions on AWS that trigger on CloudWatch events, or using boto3 to manage resources at scale. The infrastructure is already there; the missing piece is usually just connecting the dots.
Read more →Run isolated sandboxes with full lifecycle control: AWS Lambda introduces MicroVMs
AWS just announced Lambda MicroVMs, a new compute primitive that shifts how you think about serverless isolation and state management. Instead of sharing kernel resources across functions, each MicroVM gives you a dedicated, lightweight virtual machine with full isolation. This sits somewhere between traditional Lambda containers and full EC2 instances—you get the isolation guarantees of a VM without managing infrastructure or waiting for boot times.
Here’s what makes this technically different: traditional Lambda functions run in a shared sandbox environment within a container, which means the kernel and some system resources are technically shared across invocations (though AWS handles security isolation at the application level). MicroVMs flip this model. Each function gets its own isolated kernel and resource namespace, similar to how you’d think about separate machines. They launch in milliseconds and can maintain state for up to 8 hours, meaning you can pause, resume, and reconnect without losing your session. There’s no need to rebuild state from scratch on every invocation. Practically, this matters because you have explicit control over the VM lifecycle—you decide when to pause, resume, or terminate, rather than relying on Lambda’s default timeout and cleanup model.
Read more →Accelerate Incident Resolution with PagerDuty and AWS DevOps Agent
Every ops engineer knows the scenario: your phone buzzes at 2 a.m. with a critical alert. Your heart sinks. The notification tells you that something is broken, but not why. You’re now scrambling through CloudWatch logs, SSH-ing into instances, and running diagnostics while your application hemorrhages traffic and your customers watch their requests timeout. This context gap—between detection and understanding—is where SRE teams waste the most time during incidents. AWS and PagerDuty have partnered to close that gap with the AWS DevOps Agent, a tool designed to automatically gather diagnostic data and surface it directly in PagerDuty incidents, cutting mean-time-to-resolution (MTTR) significantly.
Read more →Feature Flag Orchestration with AWS DevOps Agent and LaunchDarkly
When an outage hits at 2 AM, your team’s response speed determines whether customers experience a five-minute blip or a cascading disaster. Yet many organizations still manage feature flags and incident response separately—meaning engineers waste precious minutes hunting through dashboards, deciding which flags matter, and coordinating manual changes across teams. AWS DevOps Agent paired with LaunchDarkly bridges this gap by automating the connection between your incident response workflows and feature flag management, letting engineers respond to emergencies with a single action instead of a dozen.
Read more →Supercharge your cloud operations with the Kiro power for AWS DevOps Agent
The 2 AM alert is a rite of passage in cloud engineering. Your phone buzzes. Your service is down or degrading. You stumble out of bed and start the familiar ritual: SSH into the bastion host, grep through CloudWatch logs, check the deployment history, trace through your code to understand what changed. Meanwhile, crucial context is scattered across a dozen browser tabs—your monitoring dashboard, X-Ray traces, infrastructure diagrams, configuration files. By the time you’ve assembled the full picture, you’ve already lost 20 minutes you didn’t have to spare.
Read more →Announcing Amazon EC2 G7 instances accelerated by NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs
AWS has released Amazon EC2 G7 instances into general availability, marking a significant upgrade for workloads that demand serious GPU power. These instances pack NVIDIA’s RTX PRO 4500 Blackwell Server Edition GPUs—a processor designed specifically for data centers rather than gaming or consumer applications. If you’ve been running inference models, rendering graphics pipelines, or processing large datasets, G7 instances represent a meaningful leap forward in performance-per-dollar compared to their predecessors.
Read more →Amazon ECS introduces new high-resolution metrics for faster service auto scaling
Container applications need to respond quickly to traffic spikes. If your ECS service waits minutes to scale up during a sudden surge, users experience slowdowns and timeouts. Amazon’s new high-resolution metrics feature addresses this timing challenge by allowing ECS auto scaling policies to react based on metrics collected at one-second intervals instead of the default one-minute intervals. This tighter feedback loop means your containers can scale up faster when demand increases and scale down more efficiently during quiet periods.
Read more →How we built an internal data analytics agent
GitHub recently shared how they built Qubot, an internal analytics agent that lets employees ask questions about company data using plain language instead of writing SQL queries. It’s a practical example of how AI can reduce friction in data workflows—something that applies far beyond GitHub’s walls.
At its core, Qubot solves a common problem: data exists in databases, but accessing it requires SQL expertise. Not everyone on a team has that skill, and even those who do spend time writing boilerplate queries. GitHub’s approach uses Claude (via Bedrock or similar) to translate natural language questions into SQL queries that run against their internal data warehouse. An employee can ask “How many pull requests were merged last quarter?” and get results without touching a database client. The agent handles schema understanding, query generation, and result formatting—essentially acting as a smart intermediary between human questions and structured data.
Read more →Production-Ready Autonomous Incident Resolution with AWS DevOps Agent (now GA) and Datadog MCP Server
The partnership between AWS and Datadog has matured into something genuinely useful: a system that can detect, diagnose, and fix infrastructure problems with minimal human intervention. AWS DevOps Agent, now generally available, works alongside Datadog’s Model Context Protocol (MCP) Server to turn monitoring alerts into actionable resolutions. Instead of waiting for on-call engineers to wake up, correlate logs, check configurations, and apply fixes, this integration handles the routine work automatically—and does it in minutes instead of hours.
Read more →Getting more from each token: How Copilot improves context handling and model routing
If you’ve been using GitHub Copilot, you’ve probably noticed that your credit usage can add up quickly. Every code suggestion, chat conversation, and model inference consumes tokens—the small units that AI models use to process and generate text. GitHub’s latest improvements focus on making those tokens work harder for you, reducing waste and ensuring your credits stretch further while actually improving code quality. This matters because in a world where AI assistance is becoming essential to development workflows, efficiency directly impacts both your wallet and your team’s productivity.
Read more →Announcing Web Search on Amazon Bedrock AgentCore: Ground your AI agents in current, accurate web knowledge
One of the biggest challenges when deploying AI agents in production is keeping them accurate. Language models have knowledge cutoffs—they don’t know about events after their training data ends. If your customer service agent is answering questions about your latest product launch or an agent needs real-time pricing information, it’s working with stale information. AWS is addressing this with Web Search on Amazon Bedrock AgentCore, a managed capability that lets your agents pull current information directly from the web without you having to build and maintain the infrastructure yourself.
Read more →Introducing Amazon Bedrock Managed Knowledge Base for faster, more accurate enterprise AI applications
Enterprise AI teams face a familiar pain point: building retrieval-augmented generation (RAG) systems is complex. You need to connect to multiple data sources, parse different file formats, orchestrate embeddings, manage vector databases, and chain everything together—all while keeping your application accurate and performant. AWS’s new Fully Managed Knowledge Bases for Amazon Bedrock aims to eliminate much of this infrastructure work, letting your team focus on what actually matters: delivering business value.
Read more →Amazon S3 annotations: attach rich, queryable context directly to your objects
Amazon S3 just added a feature called annotations that fundamentally changes how you can work with object metadata at scale. Instead of managing metadata in separate databases or systems, you can now attach up to 1 GB of rich, queryable context directly to S3 objects. For teams building AI agents and automation workflows, this is a practical shift that simplifies data discovery and context management in ways that single-key/value tag systems simply can’t match.
Read more →GitHub Copilot CLI for Beginners: Overview of common slash commands
GitHub Copilot has expanded beyond code editors into the terminal itself. Copilot CLI brings AI-assisted command suggestions directly to your shell, making it easier to construct complex commands without memorizing syntax or hunting through documentation. For developers working with AWS, building automation scripts, or managing cloud infrastructure from the command line, this tool can significantly reduce friction when dealing with unfamiliar tools or verbose command syntax.
Copilot CLI works by accepting natural language descriptions of what you want to accomplish, then translating them into actual shell commands. When you type a slash command like ?? or git!, you’re signaling to Copilot that you need help. The tool analyzes your input, considers the context of your current shell environment and recent commands, and suggests appropriate commands to run. It understands common cloud CLI tools like the AWS CLI, kubectl, Terraform, and others—meaning you can ask for help with complex flags and arguments without needing to reference man pages. The suggestions appear inline or in a prompt, letting you review before executing.
Now available: Amazon EC2 M9g and M9gd instances powered by new AWS Graviton5 processors
AWS has released new EC2 instance types—M9g and M9gd—built on the latest AWS Graviton5 processor. If you’ve been following the evolution of AWS-custom silicon, this is a meaningful step forward. Graviton5 delivers up to 25% better compute performance than its Graviton4 predecessor while maintaining AWS’s focus on energy efficiency. For teams running containerized workloads, microservices, or general-purpose applications, this means more capability per dollar and per watt consumed.
The technical improvement comes from a redesigned processor architecture. Graviton5 increases clock speeds, improves instruction throughput, and enhances memory bandwidth compared to Graviton4. The M-series instances are general-purpose machines—they balance compute, memory, and network resources—making them suitable for a wide range of workloads. The key distinction: M9g instances come with local NVMe SSD storage (the “d” in M9gd), which helps if your application needs fast, temporary storage without making extra API calls to EBS. For Python-based batch jobs, Node.js APIs, or containerized applications using Docker and Kubernetes, these instances fit naturally into existing architectures.
Read more →Claude Fable 5 available today in Microsoft Foundry: Powering the next era of autonomous agents
Anthropic has released Claude Fable 5, their latest frontier AI model, through Microsoft Foundry—marking a significant milestone in making advanced AI capabilities available to enterprise developers. This release represents a shift toward practical, production-grade autonomous agents that can handle complex workflows without constant human intervention. If you’ve been experimenting with Claude’s API or watching the AI space evolve, this is worth understanding because it affects the tools you’ll be building with over the next year.
Read more →How we made GitHub Copilot CLI more selective about delegation
GitHub recently shared insights into improving how GitHub Copilot CLI decides when to hand off tasks to other AI agents or tools. The core problem they were solving is surprisingly common: when an AI system can delegate work, it often does so too eagerly, creating unnecessary handoffs that slow things down and introduce failure points. By making Copilot CLI smarter about when to delegate, they reduced overhead while keeping the benefits of task specialization.
Read more →Making secret scanning more trustworthy: Reducing false positives at scale
Secret scanning is one of those security tools that sounds simple in theory but gets complicated fast. The idea is straightforward: scan your codebase for accidentally committed credentials like API keys, database passwords, or AWS access tokens before they reach production. But here’s the problem that GitHub tackled—these scanners generate tons of false positives. A developer commits a test string that looks like a secret, or includes a placeholder in documentation, and suddenly your security team is flooded with alerts that waste time and erode trust in the tool itself. When people ignore 90% of alerts because they’re noise, they’ll miss the real threats hiding in the remaining 10%.
Read more →Diagnose EKS Node Issues Faster with AWS DevOps Agent and Custom MCP
When your Kubernetes cluster starts throwing CrashLoopBackOff errors at 3 AM, you don’t want to manually SSH into nodes, grep through logs, and cross-reference timestamps with CloudWatch metrics. AWS DevOps Agent automates exactly this kind of troubleshooting by investigating production incidents autonomously. It can diagnose pod failures, trace configuration changes through AWS CloudTrail audit logs, and correlate metrics with cluster events—all without waking up your on-call engineer. But here’s the catch: it only works well when your troubleshooting data lives in AWS services it knows about. When critical diagnostics are scattered across custom monitoring tools, proprietary observability platforms, or internal systems, DevOps Agent hits a wall.
Read more →Give GitHub Copilot CLI real code intelligence with language servers
GitHub Copilot CLI has been a useful tool for developers working in terminal environments, offering AI-powered suggestions for commands and code snippets. However, its effectiveness has been limited by how it understands your codebase. Traditionally, Copilot CLI relied on grep searches and basic text parsing to gather context about your project—essentially pattern matching without true code comprehension. GitHub has now addressed this limitation by integrating Language Server Protocol (LSP) support, enabling Copilot CLI to tap into the same sophisticated code analysis that powers modern IDEs like VS Code.
Read more →From one-off prompts to workflows: How to use custom agents in GitHub Copilot CLI
GitHub Copilot CLI has evolved beyond answering random terminal questions. The latest addition—custom agents—lets you teach Copilot about your specific tech stack, infrastructure patterns, and team processes. Instead of explaining your deployment pipeline every time you ask for help, you can set it up once and have Copilot understand your context automatically. This shift from one-off prompts to repeatable workflows is particularly valuable for teams managing complex cloud environments or standardized deployment procedures.
Read more →Anthropic Claude Fable 5 on AWS: Mythos-class capabilities with built-in safeguards now available
AWS has quietly expanded what’s possible for enterprises building AI applications by making Claude Fable 5 available through Amazon Bedrock and the Claude Platform on AWS. This release democratizes what Anthropic calls “Mythos-class” AI capabilities—essentially the high-performance reasoning and generation you’d expect from their most advanced model—while maintaining the safety-focused architecture that’s become Anthropic’s calling card. If you’ve been hesitant about deploying sophisticated AI models in regulated environments, this development deserves your attention.
Read more →Microsoft Build 2026: Building agentic apps with Microsoft Fabric and Microsoft Databases
Microsoft’s latest announcements at Build 2026 center on making it easier to develop AI agents that can autonomously handle business tasks. The company is positioning Microsoft Fabric and Microsoft Databases as a unified foundation for these applications—essentially creating an integrated platform where your data infrastructure, AI models, and application logic live together rather than scattered across separate services. This matters because building effective AI agents requires tight coupling between data access, model intelligence, and real-time decision-making. When these components are fragmented, you’re fighting latency issues, consistency problems, and operational complexity.
Read more →Announcing Microsoft Discovery general availability and Microsoft Discovery app preview
Microsoft has released Microsoft Discovery as a generally available platform, marking a significant shift in how organizations can build and manage agentic AI workflows. If you’ve been following the AI space, you know that autonomous agents—AI systems that can plan, execute, and adapt without constant human intervention—are becoming increasingly central to enterprise automation. Microsoft Discovery is designed to solve one of the biggest challenges teams face: how to actually build these agents responsibly and at scale, not just experiment with them in isolated prototypes.
Read more →Try the new console experience in Amazon Bedrock, optimized for Anthropic- and OpenAI-compatible APIs
Amazon Bedrock just rolled out a redesigned console that makes it easier to explore, test, and deploy foundation models without leaving the AWS interface. If you’ve found yourself juggling multiple browser tabs to compare models, copy-paste API documentation, or remember which code snippets work with which services, this update addresses those friction points directly. The new experience is specifically optimized for Anthropic and OpenAI-compatible APIs, meaning whether you’re using Claude, GPT models, or others in the compatible ecosystem, you’ll find a more cohesive workflow.
Read more →Debug deployment failures faster with the Deployments tab in AWS Elastic Beanstalk
Deployment failures are frustrating. One moment your application is ready to ship, the next you’re hunting through logs trying to figure out what went wrong. Traditionally, when an Elastic Beanstalk deployment fails, you’d wait for it to finish, request a log bundle, download it locally, and then manually search through files like eb-engine.log, cfn-init.log, and platform.log hoping to spot the error. If you’re new to Beanstalk’s logging structure, this process can feel like finding a needle in a haystack. AWS has streamlined this workflow with the Deployments tab in the Elastic Beanstalk console, which surfaces error messages directly without requiring you to dig through bundled logs.
Read more →Claude Opus 4.8 is now available in Microsoft Foundry
Microsoft has made Claude Opus 4.8, Anthropic’s most advanced reasoning model, available through Azure AI Foundry. This marks another important step in multi-model AI accessibility, giving teams working in the Microsoft ecosystem direct access to a frontier-class LLM without leaving their familiar Azure environment. For organizations already invested in Azure infrastructure, this eliminates friction in model selection and deployment.
From a technical perspective, Azure AI Foundry handles the integration through its managed API endpoints. Rather than managing separate connections to Anthropic’s systems, you authenticate through Azure’s identity layer and make standard REST API calls—the same pattern you’d use for other Azure AI services. This means your existing error handling, rate limiting logic, and monitoring dashboards work seamlessly. The model supports both synchronous requests for real-time applications and batch processing APIs for large-scale workloads, giving you flexibility in how you structure applications.
Read more →GitHub Copilot app: The agent-native desktop experience
At Microsoft Build 2026, GitHub announced a significant shift in how AI agents integrate into developer workflows. The new GitHub Copilot app represents a move away from browser-based AI assistants toward native desktop experiences designed specifically for autonomous agents. Rather than forcing agents into existing chat interfaces, this approach builds tools that let agents interact with your development environment the way they naturally need to—running commands, accessing files, and integrating with your existing tools without friction.
Read more →Get started with OpenAI GPT-5.5, GPT-5.4 models, and Codex on Amazon Bedrock
Amazon Bedrock just made OpenAI’s latest frontier models available to everyone. GPT-5.5 and GPT-5.4 are now generally available alongside Codex, OpenAI’s specialized coding agent. If you’ve been waiting to integrate cutting-edge language models into your applications without managing infrastructure yourself, this is worth your attention. Bedrock handles the heavy lifting—you focus on building.
Here’s what’s actually happening under the hood. Bedrock is a managed service that abstracts away model infrastructure. Instead of running your own API calls to OpenAI’s servers, you’re making requests through AWS’s infrastructure with their “high performance inference engine.” This matters because it means lower latency, tighter integration with your AWS environment, and unified billing. You pay per token consumed, not monthly subscriptions. If you’re building a document analysis pipeline in Python, for example, you can now invoke GPT-5.5 via a simple boto3 call, get the response back, and immediately pass it to other AWS services like S3 or Lambda for post-processing—all within a single VPC and with CloudTrail logging every request.
Read more →Automate root cause analysis across Datadog and Elasticsearch with AWS DevOps Agent
When your microservices architecture spans dozens of applications and infrastructure components, a single failed transaction becomes a needle-in-a-haystack debugging problem. A payment might fail because of a timeout in Service A, a queue overflow in Service B, a network misconfiguration in AWS, or degraded database performance—and the clues are scattered across Elasticsearch logs, Datadog metrics, and CloudTrail events. Manually correlating these signals is slow, error-prone, and exactly the kind of repetitive work that makes on-call rotations exhausting. AWS DevOps Agent addresses this by automating the collection and correlation of observability data across your entire stack, turning fragmented signals into coherent root cause analysis.
Read more →Introducing the next generation of AWS Resilience Hub for generative AI-based SRE resilience journey
AWS has released a significantly enhanced version of Resilience Hub that fundamentally changes how Site Reliability Engineers (SREs) approach application resilience. The new generation combines automated dependency discovery, AI-powered failure analysis, and organizational-scale reporting into a unified platform. For teams managing complex distributed systems on AWS, this represents a meaningful shift from manual resilience assessment to data-driven, AI-assisted resilience planning.
The technical foundation centers on four key capabilities working together. First, the improved application model gives you finer-grained control over how you define application components and their interconnections. Second, dependency discovery automatically maps relationships between your AWS resources—think EC2 instances, RDS databases, load balancers, and Lambda functions—without requiring manual configuration. Third, generative AI analyzes potential failure modes across your architecture and suggests specific resilience improvements. Finally, modular resilience policies let you define and enforce standards across your organization rather than managing resilience individually per application. Practically speaking, when you add an application to Resilience Hub, the system automatically discovers your AWS infrastructure, generates a dependency graph, and uses AI to identify weaknesses like single points of failure or missing redundancy.
Read more →Introducing the next generation of Amazon OpenSearch Serverless for building your agentic AI applications
AWS has rebuilt Amazon OpenSearch Serverless from the ground up to handle the demands of agentic AI workloads and dynamic applications. This matters because AI agents—systems that autonomously make decisions and take actions—have fundamentally different resource needs than traditional applications. They spike unpredictably, require sub-second latency for decision-making, and need vector search capabilities to understand context from documents and data. The previous generation of OpenSearch Serverless wasn’t designed with these patterns in mind. The new version changes that with instant autoscaling that responds in milliseconds rather than minutes, meaning your AI agents never wait for infrastructure to catch up.
Read more →How AWS DevOps Agent uses multi-agent reasoning to find root causes
Confirmation bias is the silent killer of incident response. You get paged at 2 AM, spot elevated CPU on your API server, assume that’s the problem, find one log line that seems to confirm it, and spend the next three hours chasing a dead end. Meanwhile, the real culprit—a memory leak in a dependency, a poorly optimized database query in a different service, or a configuration drift somewhere else entirely—keeps causing damage. AWS DevOps Agent tackles this exact problem by using multiple independent AI agents that reason through incidents together, each approaching the problem from a different angle before reaching a consensus on root cause.
Read more →Powering multi-cluster workloads with seamless cross-cluster networking for Azure Kubernetes Fleet Manager
Running Kubernetes at scale often means distributing workloads across multiple clusters—whether for high availability, disaster recovery, or geographic distribution. But managing networking across clusters has traditionally been complex and fragmented. Microsoft’s latest announcement brings Cilium-based cross-cluster networking to Azure Kubernetes Fleet Manager, creating a unified network fabric that treats multiple clusters as a single, logical system. This addresses a real pain point: teams previously had to cobble together networking solutions or accept the operational burden of managing cluster-to-cluster communication manually.
Read more →GitHub recognized as a Leader in the Gartner® Magic Quadrant™ for Enterprise AI Coding Agents for the third year in a row
GitHub’s third consecutive recognition as a Leader in Gartner’s Magic Quadrant for Enterprise AI Coding Agents reflects the maturity of AI-assisted development tools in production environments. This isn’t just industry validation—it signals that AI coding agents have moved beyond experimental features into tools that enterprises genuinely depend on for shipping code. The recognition acknowledges both GitHub’s technical capabilities and its ability to execute at scale, serving millions of developers across organizations of all sizes.
Read more →AWS Weekly Roundup: AWS Transform at 1 year, Claude Platform on AWS, EC2 M3 Ultra Mac instances, and more (May 18, 2026)
AWS Transform has quietly become one of the most practical tools for enterprises sitting on aging codebases. A year after its launch, AWS is celebrating the service’s growth by introducing AWS Transform custom—a feature that lets you define your own transformation rules alongside AWS-managed ones. If you’ve worked with legacy systems, you know the pain: upgrading a .NET application from version 4.7 to 8.0, migrating a mainframe workload to cloud-native architecture, or refactoring VMware-dependent code feels like an endless manual process. AWS Transform uses agentic AI to automate this at scale, analyzing your codebase, identifying patterns, and executing transformations across thousands of files simultaneously.
Read more →Meet Gordon: Docker's AI Agent For Your Entire Container Workflow
Developers have gotten used to AI handling the tedious parts of their work. GitHub Copilot finishes your functions, automation tools merge pull requests, and CI/CD pipelines catch most errors before they hit production. But there’s still a stubborn gap: when something breaks in your container environment, you’re often back to manual troubleshooting, digging through logs, and waiting for someone with deep Docker expertise. Docker’s new Gordon agent aims to close that gap by bringing AI-powered intelligence directly into your container workflow, handling everything from problem diagnosis to automated fixes.
Read more →Modernizing Excel VBA to Python at Scale with AWS Transform Custom
Legacy Excel VBA applications are a silent IT burden at many organizations. These macro-heavy spreadsheets often contain critical business logic—financial calculations, data pipelines, reporting workflows—but they’re difficult to maintain, hard to test, and increasingly risky as security vulnerabilities emerge. The problem gets worse at scale: a company might have dozens or hundreds of these applications spread across departments, each one a potential technical debt landmine. Rewriting them manually would take months or years of developer time. AWS Transform Custom offers a practical path forward by using AI to automatically convert VBA code to modern Python while preserving the original functionality—a process that typically takes hours instead of weeks.
Read more →Announcing AWS CDK Mixins: Composable Abstractions for AWS Resources
AWS just released CDK Mixins, and if you’ve ever found yourself copying boilerplate code across multiple CDK constructs, this feature deserves your attention. Mixins are a programming pattern that let you apply reusable behaviors to any construct—whether it’s a low-level L1, high-level L2, or your own custom resource—without forcing you into a rigid class hierarchy. Think of them as a way to bolt on functionality like monitoring, encryption, or compliance tagging to existing resources without rewriting them from scratch.
Read more →Building Self-Extending CLI Tools with Strands Agent
Imagine describing a new command to your CLI tool in plain English and having it instantly available—no code commits, no deployments, no waiting for your team to implement it. That’s the premise behind self-extending CLI tools built with Amazon Bedrock and the Strands Agents SDK. Instead of treating command-line interfaces as static artifacts that require development cycles to modify, this pattern enables them to dynamically generate, refine, and version new capabilities at runtime through conversational interaction. It’s a shift from “tools that do things” to “tools that learn and grow.”
Read more →Kubernetes v1.36: Mixed Version Proxy Graduates to Beta
Kubernetes just reached a significant reliability milestone. The Mixed Version Proxy (MVP) feature has graduated from Alpha to Beta in version 1.36, which means it’s moving closer to becoming a stable production feature. If you’ve ever sweated through a cluster upgrade wondering if your applications would survive the transition, this feature is designed to be your safety net. MVP addresses a real pain point in Kubernetes operations: ensuring backward compatibility when your control plane and nodes are running different versions during an upgrade.
Read more →Simplify cross-account and cross-Region stack output references with AWS CloudFormation and CDK's new Fn::GetStackOutput
If you’ve managed infrastructure across multiple AWS accounts or regions, you know the pain: CloudFormation stack outputs live in isolated silos. Want to reference a VPC ID from a stack in a different account? You’re stuck manually copying values, storing them in Parameter Store, or building custom Lambda functions to bridge the gap. AWS just made this significantly easier with Fn::GetStackOutput, a new CloudFormation function that lets you directly reference stack outputs across account and region boundaries. This is a small feature with surprisingly large practical implications for how teams organize and scale their infrastructure-as-code.
Custom MCP Catalogs and Profiles: Advancing Enterprise MCP Adoption
The Model Context Protocol (MCP) has quietly become essential infrastructure for connecting AI applications to custom tools and data sources. Docker’s announcement of Custom Catalogs and Profiles moving to general availability addresses a real pain point: how do enterprises standardize, distribute, and manage MCP servers at scale? If you’ve been experimenting with MCP servers locally, you’ve probably packaged them ad-hoc—copying configurations, managing dependencies, and hoping everything works across different environments. Custom Catalogs and Profiles solve this by providing a structured way to package and distribute MCP tooling across your organization, similar to how you might manage container registries or package repositories.
Read more →Amazon Bedrock introduces new advanced prompt optimization and migration tool
Prompt engineering has become a critical skill in the AI era, but getting it right remains challenging. AWS is addressing this pain point with Amazon Bedrock’s Advanced Prompt Optimization—a new feature that automates what previously required manual trial-and-error. The tool lets you systematically optimize prompts for your current foundation model or quickly migrate them to new models, with built-in evaluation feedback to guide the process. If you’ve ever spent hours tweaking prompt wording only to get marginal improvements, this feature directly tackles that problem.
Read more →Kubernetes v1.36: Advancing Workload-Aware Scheduling
If you’ve run machine learning training jobs or batch processing on Kubernetes, you’ve probably noticed something frustrating: the scheduler treats every Pod the same way. It doesn’t understand that your distributed TensorFlow training job needs all its worker nodes to start together, or that your batch processing pipeline requires specific resources across multiple containers. Kubernetes v1.36 takes a meaningful step toward fixing this with improved workload-aware scheduling capabilities that let the scheduler understand what your applications actually need to run efficiently.
Read more →GitHub Copilot individual plans: Introducing flex allotments in Pro and Pro+, and a new Max plan
GitHub has restructured its Copilot individual subscription tiers starting June 1st, introducing more flexibility in how developers consume AI-assisted coding features. The new lineup includes updated Pro and Pro+ plans with flexible token allotments, plus a new Max tier for heavy users. This shift reflects how GitHub has been listening to user feedback about pricing and usage patterns—recognizing that developers don’t all code the same way, and neither should their billing.
Read more →Agentic application modernization at scale with Strands and Amazon Transform custom
Application modernization is one of those necessary-but-painful tasks most large organizations face. You’ve got hundreds of repositories still running on outdated Python versions, legacy SDKs, or frameworks that nobody really supports anymore. Each one needs analysis, custom transformation logic, validation, and careful deployment. Without automation, you’re looking at months of manual work spread across multiple teams. Amazon Transform custom addresses this by combining AI-powered code analysis with intelligent automation—letting you tackle modernization at scale rather than one repository at a time. Paired with Strands’ agentic capabilities, it offers a genuinely different approach to a problem that’s been tedious for too long.
Read more →Amazon Redshift introduces AWS Graviton-based RG instances with an integrated data lake query engine
Last month, AWS announced a significant upgrade to Amazon Redshift with the introduction of RG instances powered by AWS Graviton processors. If you’ve been working with Redshift’s RA3 instances, this matters to you: RG instances deliver up to 2.4x faster performance on the same workloads while costing 30% less per vCPU. But the performance bump isn’t the only story here—the integrated data lake query engine fundamentally changes how you can structure your analytics infrastructure.
Read more →Kubernetes v1.36: Moving Volume Group Snapshots to GA
Kubernetes v1.36 marks an important milestone for storage management with Volume Group Snapshots reaching General Availability (GA). This feature, which has progressed through Alpha (v1.27) and Beta (v1.32) phases, is now stable enough for production workloads. For teams running stateful applications on Kubernetes, this means you can now reliably snapshot multiple persistent volumes at the same time—a capability that was previously fragmented and difficult to coordinate.
Volume Group Snapshots solve a real problem: coordinating consistent snapshots across multiple storage volumes. In Kubernetes, applications often use multiple PersistentVolumes (think a database with separate volumes for data, logs, and backups). Previously, you had to manually snapshot each volume individually, which created timing gaps where your snapshots wouldn’t be truly consistent. Now, the VolumeGroupSnapshot API lets you define multiple volumes as a group and capture them atomically. The feature works by leveraging your underlying storage provider’s snapshot capabilities—whether that’s AWS EBS, Google Cloud Persistent Disks, or other CSI-compatible storage systems. When you create a VolumeGroupSnapshot resource, Kubernetes communicates with the storage provider to snap all volumes simultaneously, ensuring consistency at a specific point in time.
Read more →AWS Weekly Roundup: Amazon Bedrock AgentCore payments, Agent Toolkit for AWS, and more (May 11, 2026)
One of the most significant updates from AWS this week introduces managed payment capabilities to Amazon Bedrock AgentCore—a feature that fundamentally changes how AI agents can operate autonomously. Until now, building agents that could independently purchase services, call paid APIs, or pay for compute resources required you to build custom payment infrastructure from scratch. This meant handling credential management, billing reconciliation, PCI compliance, and integration with payment processors yourself. Bedrock AgentCore now abstracts away this complexity with built-in payment capabilities, developed in partnership with Coinbase and Stripe.
Read more →Kubernetes v1.36: More Drivers, New Features, and the Next Era of DRA
If you’ve struggled with allocating GPUs, TPUs, or other specialized hardware in Kubernetes, you’re not alone. Platform teams have long faced a frustrating reality: the standard resource request model (CPU and memory) doesn’t capture the complexity of modern hardware accelerators. Kubernetes v1.36 tackles this head-on with significant maturity improvements to Dynamic Resource Allocation (DRA), a feature that’s reshaping how teams manage specialized hardware at scale. This release marks a pivotal moment where DRA moves from experimental territory into something platform administrators can seriously consider for production workloads.
Read more →Building an end-to-end agentic SRE using AWS DevOps Agent
The traditional SRE workflow hasn’t fundamentally changed in years: something breaks, a human gets paged, they log into multiple dashboards, correlate logs and metrics across tools, hypothesize what went wrong, and manually execute remediation steps. This process works fine when you’re managing a handful of servers, but modern cloud architectures—with their distributed microservices, serverless functions, and event-driven systems—generate so much data and complexity that manual incident response becomes a bottleneck. AWS’s new DevOps Agent represents a shift in how we can approach this problem: instead of waiting for humans to react, we can automate the entire investigation and remediation workflow using agentic AI that understands your infrastructure.
Read more →Improving token efficiency in GitHub Agentic Workflows
If you’ve been experimenting with AI agents in your CI/CD pipelines, you’ve probably noticed something: those token counts add up fast. GitHub recently shared how they tackled this exact problem in their own production workflows, and the lessons are worth understanding whether you’re running agents on every pull request or planning to.
Agentic workflows—automated systems that use AI models to reason through tasks, make decisions, and take actions—have become a practical way to automate code review, testing, and deployment tasks. The problem is that each agent invocation can consume significant API tokens, especially when running on high-frequency events like pull requests. A small inefficiency in how you’re prompting your agents or structuring their context can multiply across thousands of runs, turning what seemed like a reasonable automation into an unexpectedly expensive bill. GitHub’s team discovered they were passing redundant information to their models, making unnecessary API calls, and regenerating context that could have been cached or reused. For a company running agents at scale, these inefficiencies weren’t just costly—they were slowing down feedback loops and adding latency to their development process.
Read more →Kubernetes v1.36: Server-Side Sharded List and Watch
Kubernetes clusters are getting bigger, and bigger clusters create bigger problems. When you’re running tens of thousands of nodes, controllers that need to watch resources like Pods start hitting a scaling wall. Every instance of a horizontally scaled controller receives the complete stream of events from the API server—and that’s expensive. Each replica deserves to deserialize every single object, even though most of them don’t belong to that replica’s slice of responsibility. This redundancy multiplies CPU, memory, and network costs across your entire control plane. Kubernetes v1.36 introduces server-side sharded list and watch to fix this inefficiency.
Read more →Validating agentic behavior when correct isn't deterministic
The challenge of validating AI agents cuts to the heart of modern development workflows. When GitHub Copilot or similar coding agents generate solutions, how do you know if they’re actually correct? Unlike traditional unit tests where inputs map to deterministic outputs, agentic systems can arrive at valid solutions through multiple legitimate paths. A function might be refactored differently, use alternative libraries, or follow different architectural patterns—all while being functionally correct. This ambiguity makes validation incredibly difficult, and it’s why many teams struggle to trust autonomous agents in their CI/CD pipelines.
Read more →The AWS MCP Server is now generally available
AWS has released the AWS MCP Server as a generally available service, marking a significant step in making AI agents and coding assistants more practical for enterprise AWS environments. If you’ve been following the evolution of AI tooling, you’ve probably noticed a growing gap: AI agents and coding assistants are getting smarter, but they often lack secure, authenticated access to your actual infrastructure. The AWS MCP Server fills that gap by implementing the Model Context Protocol (MCP)—an open standard that lets these AI tools safely interact with AWS services without requiring complex custom integrations.
Read more →Kubernetes v1.36: Declarative Validation Graduates to GA
If you’ve ever deployed a Kubernetes manifest and gotten a cryptic validation error, you’ve encountered the limitations of the old way Kubernetes validates your resources. With Kubernetes v1.36, Declarative Validation—a feature that’s been in beta for a while—has now reached General Availability. This might sound like an internal implementation detail, but it’s actually a meaningful shift in how Kubernetes handles validation, and it matters if you’re building reliable infrastructure.
Here’s what’s changed under the hood: traditionally, Kubernetes validation rules for native resources (like Pods, Services, and Deployments) were hardcoded directly into the API server using Go. This approach works, but it’s rigid and difficult to extend. Declarative Validation moves these rules into a declarative format called Common Expression Language (CEL) rules, which are stored alongside the resource definitions themselves. Think of it as shifting from validation logic buried in source code to validation rules that live in your OpenAPI schema. For you as a user, this means validation errors become clearer and more consistent across different tools. Instead of mysterious rejection messages, you’ll see exactly which field violated which rule and why. The validation rules are also now documented as part of the API spec, so tools like kubectl and the Kubernetes dashboard can surface that information directly.
Read more →Modernize your workflows: Amazon WorkSpaces now gives AI agents their own desktop (preview)
The gap between legacy systems and modern AI has always been one of the trickiest problems in enterprise automation. You’ve got decades-old desktop applications that run critical business processes, but they were never designed to talk to AI agents. Until now, bridging that gap meant either ripping and replacing the entire system or building custom integrations—both expensive and risky propositions. AWS is tackling this differently with a preview feature that lets AI agents operate desktop applications directly through Amazon WorkSpaces, the managed virtual desktop service. Instead of modernizing your backend, you’re giving the AI agent its own desktop environment to interact with applications the way a human would.
Read more →Enforcing trust and transparency: Open-sourcing the Azure Integrated HSM
When you’re building AI systems or automation pipelines in the cloud, you’re making a fundamental bet: that your encryption keys—the digital equivalent of your master password—stay secure. Microsoft’s recent move to open-source the Azure Integrated HSM signals a shift in how cloud providers are approaching this problem. Rather than asking you to simply trust that keys are protected behind closed doors, Azure is pulling back the curtain. By open-sourcing the HSM (Hardware Security Module) design, Microsoft is letting security teams and researchers verify exactly how cryptographic trust flows from the silicon level all the way up through Azure services.
Read more →AWS Transform custom: Enterprise Code Modernization with the Learn-Scale-Improve Flywheel
There’s a fundamental difference between modernizing one codebase and modernizing fifty. When you’re dealing with a single repository, the challenge is mostly technical—you pick a tool, run it, review the output, and iterate. But at enterprise scale, the bottleneck shifts. You’re no longer asking “can we modernize this code?” but rather “how do we coordinate teams, share learnings, and maintain quality across hundreds of repositories while keeping velocity high?” AWS Transform custom addresses this reality by treating code modernization not as a one-time event, but as a continuous learning system.
Read more →Kubernetes v1.36: Pod-Level Resource Managers (Alpha)
Kubernetes v1.36 introduces Pod-Level Resource Managers as an alpha feature, marking a significant shift in how the platform handles resource allocation for performance-sensitive workloads. Previously, resource management policies in Kubernetes were set at the node level through kubelet configuration—a one-size-fits-all approach that often forced teams into uncomfortable compromises. The new feature allows you to define resource management strategies at the pod level, giving you granular control over CPU pinning, memory management, and topology-aware scheduling without requiring node-level changes or multiple kubelet configurations.
Read more →A Virtual Agent team at Docker: How the Coding Agent Sandboxes team uses a fleet of agents to ship faster
At Docker, the Coding Agent Sandboxes team (internally known as “sbx”) is solving a problem that’s becoming increasingly important as AI coding agents proliferate: how do you safely give autonomous AI agents the freedom to write, test, and deploy code without risking your host system? The answer is a fleet of lightweight, containerized sandboxes that provide each AI agent—whether it’s Claude Code, Gemini, Codex, Docker Agent, or Kiro—complete isolation with full autonomy. Think of it like giving each agent its own isolated development environment where it can do whatever it needs without consequences bleeding back to your infrastructure.
Read more →OpenAI's GPT-5.5 in Microsoft Foundry: Frontier intelligence on an enterprise ready platform
When a new frontier AI model becomes available, most developers ask the same question: can I actually use this in production? Microsoft’s announcement that OpenAI’s GPT-5.5 is now generally available through Microsoft Foundry answers that with a clear yes. This isn’t just about access to cutting-edge AI—it’s about getting enterprise-grade infrastructure, compliance tooling, and support wrapped around it. For teams building agents and automation workflows on Azure, this means you can now deploy GPT-5.5 with the same reliability and governance frameworks you’d use for mission-critical applications.
Read more →Kubernetes v1.36: In-Place Vertical Scaling for Pod-Level Resources Graduates to Beta
The Kubernetes community has reached another milestone with v1.36: In-Place Pod-Level Resources Vertical Scaling is now graduating to Beta status and enabled by default. If you’ve been following the feature’s journey through earlier versions, this represents a meaningful step toward stability. For those new to the concept, this feature addresses one of Kubernetes’ long-standing operational challenges—adjusting CPU and memory requests and limits for running pods without forcing a restart. Previously, if you realized a pod needed more resources, you had to terminate it and redeploy it with new specifications, causing service interruption. In-place vertical scaling eliminates that painful workflow.
Read more →GitHub Copilot CLI for Beginners: Interactive v. non-interactive mode
GitHub Copilot CLI is an AI-powered tool that brings code suggestions directly to your terminal. Instead of switching between your code editor and documentation, you can describe what you need and get intelligent command suggestions right where you’re working. It integrates with your shell environment and uses natural language processing to understand your intent—whether you need a complex AWS CLI command, a Python one-liner, or a system administration task. For teams managing cloud infrastructure or building automation scripts, this can significantly reduce context-switching and the time spent hunting through documentation.
Read more →Kubernetes v1.36: Tiered Memory Protection with Memory QoS
Memory management has always been one of the trickier aspects of running containers at scale. You set resource requests and limits, hope nothing goes wrong, and then debug mysterious OOM (out-of-memory) kills at 3 AM. Kubernetes v1.36 is making this situation materially better with significant updates to Memory QoS, a feature that’s been evolving since v1.22. The new tiered memory protection system gives the Linux kernel much more nuanced guidance about which containers deserve memory when resources get tight, moving beyond the blunt instrument of hard limits.
Read more →Top announcements of the What's Next with AWS, 2026
AWS just wrapped their What’s Next event with three major announcements that signal where enterprise AI is heading. If you’re building on AWS or planning to, these updates deserve your attention because they’re reshaping how teams integrate AI into their workflows and operations.
The headline grab is Amazon Quick, a new AI assistant designed specifically for work. Think of it as AWS’s answer to the enterprise AI assistant problem—it’s built to understand context across your AWS environment, internal tools, and documentation. What makes it technically interesting is the desktop app approach combined with expanded integrations. Rather than forcing everything through a chat interface, Quick can natively connect to your existing tools and APIs. For a development team, this means asking Quick to help debug CloudFormation templates, explain your architecture decisions, or even scaffold boilerplate code without bouncing between windows. Under the hood, this likely leverages foundational models through Amazon Bedrock with retrieval-augmented generation (RAG) to pull context from your actual infrastructure and documents. The practical win here is reduced context-switching and faster onboarding for teams learning your specific setup.
Read more →Kubernetes v1.36: Staleness Mitigation and Observability for Controllers
If you’ve ever debugged a Kubernetes controller that mysteriously took the wrong action at the worst possible time, you’ve probably encountered staleness. Staleness happens when a controller makes decisions based on outdated information about your cluster’s state. A controller might read that a pod exists, but by the time it acts on that information, the pod has already been deleted. Or it might see an old version of a ConfigMap and roll out stale configuration to your services. These race conditions are notoriously difficult to catch during testing because they depend on precise timing—they often only surface under production load, after they’ve already caused damage.
Read more →Microsoft Discovery: Advancing agentic R&D at scale
Microsoft has expanded preview access to Microsoft Discovery, a new set of enterprise-grade AI capabilities designed specifically for research and development teams. This platform brings autonomous AI agents into the R&D workflow, allowing teams to automate complex, iterative processes that typically require significant manual effort. For organizations managing large-scale research projects—whether in pharmaceuticals, materials science, or software development—this represents a meaningful shift in how teams can approach experimentation and data analysis.
Read more →Kubernetes v1.36: Mutable Pod Resources for Suspended Jobs (beta)
Kubernetes v1.36 is promoting a useful feature to beta: the ability to modify container resource requests and limits while a Job is suspended. If you’re managing workloads at scale, this sounds like a niche feature—but it solves a real operational headache. Previously, once you created a Job with specific CPU and memory requests, those values were locked in. If you needed to adjust resources before the job ran, you had to delete and recreate it. Now you can modify the pod template of a suspended Job, adjust CPU, memory, GPU, and custom resources, then resume it without losing your place in the queue or restarting from scratch.
Read more →AWS Weekly Roundup: Anthropic & Meta partnership, AWS Lambda S3 Files, Amazon Bedrock AgentCore CLI, and more (April 27, 2026)
This week’s AWS announcements showcase a continued push toward making AI development more accessible and practical for enterprise teams. The major highlights—including a strategic partnership between Anthropic and Meta, new Lambda integrations for S3, and expanded Bedrock tooling—signal AWS’s focus on reducing friction in the AI development lifecycle. For teams building on AWS, these updates mean fewer workarounds and more direct paths from experimentation to production.
The Anthropic and Meta partnership represents a significant shift in how foundational models are being integrated into AWS services. Rather than competing in isolation, these companies are aligning to improve model accessibility and interoperability across platforms. From a technical standpoint, this means AWS customers will have more options when selecting which AI models power their applications through Amazon Bedrock. If you’re currently locked into a single model provider, this partnership creates leverage—you can test multiple models against the same workload without rebuilding your integration layer. The practical benefit? Better price negotiation, model redundancy for reliability, and the ability to choose the best tool for specific tasks (like Claude for complex reasoning or Meta’s models for cost-sensitive applications).
Read more →Introducing GPT-5.5
OpenAI just released GPT-5.5, their latest large language model, and it’s worth paying attention to if you’re building with AI in the cloud. This model improves on previous versions with better speed and reasoning capabilities, particularly for tasks that require sustained focus like writing production code, analyzing datasets, and conducting research. If you’ve been experimenting with GPT-4 or Claude, GPT-5.5 represents a meaningful step forward in handling the kind of complex, multi-step problems you actually encounter in real projects.
Read more →Kubernetes v1.36: Fine-Grained Kubelet API Authorization Graduates to GA
Kubernetes v1.36 marks an important milestone for cluster security: fine-grained kubelet API authorization has reached General Availability (GA). This feature, which began as an alpha experiment in v1.32 and moved to beta in v1.33, is now production-ready and will be enabled by default in new clusters. For teams managing Kubernetes at scale, this graduation matters because it gives you precise control over who can do what on individual nodes—closing a significant security gap that’s existed in Kubernetes for years.
Read more →Highlights from Git 2.54
Git 2.54 is the latest stable release of the distributed version control system that powers most modern development workflows. For anyone working with infrastructure automation, AI model repositories, or cloud deployments, understanding what’s new in Git can directly impact your productivity and how you manage code across teams. GitHub recently published their analysis of the most significant features and improvements in this release, offering insights into changes that affect daily development practices.
Read more →Speeding up agentic workflows with WebSockets in the Responses API
Agent-based systems have transformed how we think about automation in the cloud. Instead of rigid workflows, agents can reason through problems, take actions, and adapt based on results. But there’s a catch: traditional REST APIs introduce latency overhead that compounds with each agent step. OpenAI’s recent work on WebSocket support in the Responses API tackles this head-on by maintaining persistent connections and caching context across agent loops. For teams building autonomous systems—whether that’s code generation pipelines, customer service agents, or data processing workflows—this optimization can mean the difference between a system that feels responsive and one that feels sluggish.
Read more →Gateway API v1.5: Moving features to Stable
The Kubernetes ecosystem just got a significant stability boost. On March 14, 2026, the Kubernetes SIG Network community released Gateway API v1.5—a milestone release that graduates several experimental features into stable, production-ready status. If you’re running Kubernetes clusters and managing ingress traffic, this is worth paying attention to. Gateway API represents the next generation of how Kubernetes handles north-south traffic (traffic entering and leaving your cluster), moving beyond the older Ingress API with a more flexible, role-based design.
Read more →Automating Incident Investigation with AWS DevOps Agent and Salesforce MCP Server
Every minute counts during a production incident. While your team scrambles to understand what’s happening, customers are already noticing the outage. Traditional incident response relies on manual investigation—jumping between monitoring dashboards, checking logs, reviewing configuration changes, and updating ticket systems. AWS DevOps Agent, developed in collaboration with Salesforce, streamlines this process by automating the investigation phase, allowing teams to diagnose problems faster and keep stakeholders informed automatically.
At its core, AWS DevOps Agent combines two key technologies working together. The agent itself is an autonomous system that can investigate AWS infrastructure issues by accessing CloudWatch logs, EC2 instances, and other AWS services on your behalf. The Salesforce MCP (Model Context Protocol) Server acts as a bridge, allowing the agent to read and update incident information directly in Salesforce Service Cloud. When an incident is detected, instead of a human opening multiple tabs and manually gathering information, the agent begins investigating immediately. It searches relevant logs, identifies recent changes in your environment, checks resource utilization metrics, and correlates these findings into a coherent incident narrative—all while automatically updating your Salesforce ticket with findings and status.
Read more →AWS Weekly Roundup: Claude Opus 4.7 in Amazon Bedrock, AWS Interconnect GA, and more (April 20, 2026)
This week brings several significant updates across AWS’s AI, networking, and infrastructure services. Claude Opus 4.7 is now available in Amazon Bedrock alongside new connectivity options and security enhancements. If you’re building multi-model AI applications, managing hybrid cloud infrastructure, or concerned about quantum-safe encryption, there’s something here worth your attention.
Claude Opus 4.7 and the 1M Token Context Window
Anthropic’s Claude Opus 4.7 is now available through Amazon Bedrock, bringing meaningful improvements for developers building AI agents. The headline feature is the 1 million token context window—roughly equivalent to 750,000 words. Technically, this means you can pass an entire codebase, lengthy documentation, or months of conversation history in a single API call without token limits becoming a constraint. The model also ships with improved agentic coding capabilities, meaning it can write, debug, and refactor code with better accuracy than previous versions. Practically, this matters for teams building code analysis tools, documentation chatbots, or multi-step automation workflows. Instead of chunking documents into 100K token segments and making multiple API calls, you can now process large datasets in parallel, reducing latency and complexity in your application logic.
Read more →Building an emoji list generator with the GitHub Copilot CLI
GitHub recently demonstrated a practical example of AI-assisted development during their Rubber Duck Thursday stream: building an emoji list generator using the GitHub Copilot CLI. While emoji generators might sound like a novelty, the underlying technique reveals something genuinely useful for developers working with APIs, data transformation, and rapid prototyping. The Copilot CLI lets developers ask natural language questions directly from the terminal, turning the command line into an interactive problem-solving partner without leaving your workflow.
Read more →AWS Interconnect is now generally available, with a new option to simplify last-mile connectivity
AWS just announced the general availability of two interconnectivity services that address a real problem many organizations face: connecting infrastructure across multiple cloud providers securely and efficiently. If you’ve ever struggled with the complexity of establishing private connections between your AWS environment and resources on Azure, Google Cloud, or your own data center, these new tools are worth understanding.
At its core, AWS Interconnect – multicloud does something straightforward: it creates managed private connectivity between your Amazon VPC and VPCs on other cloud providers without routing traffic over the public internet. Technically, this builds on AWS’s existing Direct Connect infrastructure, but extends it to work with other clouds. Instead of configuring VPN tunnels manually or dealing with the complexity of peering agreements, you get a managed service that handles the heavy lifting. The new AWS Interconnect – last mile component addresses the “final stretch” problem—that expensive and often complicated last connection from your on-premises network or regional office to the AWS network itself. Rather than paying for expensive dedicated lines or dealing with carrier provisioning delays, last mile gives you a more straightforward way to establish these high-speed connections.
Read more →How GitHub uses eBPF to improve deployment safety
When you’re managing deployments across thousands of services, one subtle bug can cascade into a production outage. GitHub recently shared how they’re using eBPF (extended Berkeley Packet Filter) to catch one particularly nasty class of problems: circular dependencies in their deployment tooling. This technique represents a practical evolution in how infrastructure teams can observe and prevent failures before they happen.
At its core, eBPF is a technology that lets you run sandboxed programs inside the Linux kernel without modifying kernel code or loading kernel modules. Think of it as a safe way to insert observability hooks directly where the action happens—at the system level where processes communicate, make network calls, and access files. GitHub’s team used eBPF to instrument their deployment pipeline and trace the dependency relationships between services in real time. By hooking into system calls and tracking which services attempt to load or depend on which other services, they could build a live map of deployment dependencies. When the eBPF program detected a circular path (Service A depends on B, B depends on C, C depends on A), it could immediately flag the issue and prevent the problematic deployment from proceeding.
Read more →Introducing Anthropic's Claude Opus 4.7 model in Amazon Bedrock
Last week, AWS announced the availability of Claude Opus 4.7, Anthropic’s latest and most capable model in the Opus family, now accessible through Amazon Bedrock. This release marks a meaningful step forward in accessible enterprise AI, particularly for teams working with AWS infrastructure. Claude Opus 4.7 is optimized for tasks requiring deep reasoning and complex problem-solving—think multi-step coding projects, autonomous agent workflows, and professional analysis where accuracy matters. What makes this launch notable isn’t just the model itself, but how it’s integrated into Bedrock’s infrastructure, which AWS has rebuilt specifically for generative AI workloads.
Read more →Build a personal organization command center with GitHub Copilot CLI
GitHub recently shared how one of their engineers built a personal organization command center—essentially a custom CLI tool that acts as a single entry point for managing tasks, calendar events, and project work. The project is a great example of how modern AI tools like GitHub Copilot CLI can accelerate development of internal productivity tools that would’ve taken significantly longer to build manually. Rather than spending weeks writing boilerplate code and debugging command parsing, the engineer leveraged Copilot to scaffold the project structure, handle common patterns, and solve specific problems interactively.
Read more →The next evolution of the Agents SDK
OpenAI just released significant updates to their Agents SDK that address one of the biggest pain points in building production AI agents: security and execution isolation. If you’re working with autonomous agents that need to interact with files, databases, or external tools over extended periods, this update is worth your attention.
The core improvement centers on native sandbox execution. Previously, when you built an agent that needed to run code or access files, you had to handle isolation yourself—spinning up containers, managing permissions, cleaning up resources. It was workable but added complexity and potential security vulnerabilities. The updated SDK now provides built-in sandboxing, meaning your agent’s code runs in an isolated environment by default. Think of it like running Python scripts in a restricted subprocess, but handled automatically by the framework. This matters because agents that run unattended need hard guarantees that they can’t accidentally (or maliciously) access data outside their scope.
Read more →Troubleshooting environment with AI analysis in AWS Elastic Beanstalk
AWS Elastic Beanstalk has long been a go-to platform for developers who want to deploy web applications without getting bogged down in infrastructure management. You push your code, and Elastic Beanstalk handles the heavy lifting—provisioning capacity, configuring load balancers, scaling instances, and monitoring health. But even with all this automation, things still break. Your application might be consuming too much memory, your database connections could be pooling incorrectly, or environment configuration issues might be silently degrading performance. Historically, tracking down these problems meant digging through logs, comparing metrics against baselines, and making educated guesses. AWS has now added AI Analysis to this toolset, bringing automated troubleshooting directly into the Elastic Beanstalk console.
Read more →AWS Weekly Roundup: Claude Mythos Preview in Amazon Bedrock, AWS Agent Registry, and more (April 13, 2026)
As AI models move from experimental side projects into critical business workflows, teams face a hard reality: cost visibility becomes non-negotiable. AWS is addressing this squeeze with several new releases that acknowledge the practical challenges of scaling AI workloads. The Claude Mythos preview in Amazon Bedrock, combined with the new AWS Agent Registry, represents a meaningful shift toward production-ready AI infrastructure rather than just raw model access.
The Claude Mythos preview is important because it demonstrates how foundation models are evolving beyond size and capability metrics into specialized variants optimized for real constraints. When you’re running inference at scale—say, processing thousands of support tickets or analyzing document batches—the difference between a general-purpose model and one tuned for your specific task can mean the difference between a sustainable operation and cost overruns that catch your finance team off guard. Mythos reportedly focuses on improved efficiency in particular domains, which technically means better token-to-output ratios and lower latency for common patterns. For your Python workflows calling Bedrock’s API, this translates to more predictable costs and faster response times without architectural changes.
Read more →Kubernetes v1.36 Sneak Peek
Kubernetes v1.36 is arriving in late April 2026, and the community is gearing up for another significant release cycle. Like every major version bump, this one brings the usual mix of removals, deprecations, and new features—but this time, the enhancement list is notably substantial. If you’re managing containerized workloads in production or exploring Kubernetes for the first time, understanding what’s coming helps you plan upgrades and adjust your automation strategies accordingly.
Read more →Securely connect AWS DevOps Agent to private services in your VPCs
AWS DevOps Agent represents a meaningful shift in how teams handle operational tasks across distributed infrastructure. Rather than jumping between monitoring dashboards, ticketing systems, and cloud consoles, you get an AI-powered teammate that understands your entire stack—AWS, multicloud, and on-premises systems included. The agent proactively identifies incidents before they impact users, suggests optimizations based on real performance data, and handles routine SRE work like log analysis and root cause investigation. If you’re managing complex deployments where context switching kills productivity, this tool addresses a real pain point.
Read more →Launching S3 Files, making S3 buckets accessible as file systems
Amazon S3 has always presented developers with a choice: use object storage for its scalability and cost benefits, or switch to traditional file systems when you need interactive file access. S3 Files bridges this gap by mounting S3 buckets directly as file systems on EC2 instances, Lambda functions, and other AWS compute resources. This means you can now access objects in S3 using standard file operations—ls, cat, grep, mv—without building custom APIs or managing separate storage infrastructure.
Read more →GitHub Copilot CLI for Beginners: Getting started with GitHub Copilot CLI
If you’ve been working in the terminal lately, you’ve probably noticed how much time goes into remembering syntax, debugging commands, and looking up documentation. GitHub Copilot CLI brings the AI-assisted coding experience you might know from the editor directly into your command line, making it easier to construct complex commands, understand error messages, and automate repetitive tasks—without constantly switching between your terminal and browser windows.
At its core, GitHub Copilot CLI uses the same foundation as other Copilot products: a large language model trained on vast amounts of code and documentation that understands context and intent. When you describe what you want to accomplish in natural language—say, “find all Python files modified in the last week”—the CLI translates that into the appropriate command for your system, whether that’s Linux, macOS, or Windows. It works by analyzing your description, considering your current shell environment, and generating shell commands that match your intent. You maintain full control: the tool shows you what it’s about to run, and you approve or modify commands before execution.
Read more →GitHub Copilot CLI combines model families for a second opinion
If you’ve ever asked a colleague to review your bash command or SQL query, you understand the value of a second perspective. GitHub has introduced a similar concept into Copilot CLI through a feature called Rubber Duck—a way to get multiple AI viewpoints on the same problem. Instead of relying on a single model’s suggestion, Rubber Duck leverages different model families to cross-check and validate proposed solutions before you run them in production.
Read more →Running Agents on Kubernetes with Agent Sandbox
The way we build AI applications is fundamentally changing. For years, the dominant pattern was simple: send a prompt to an AI model, get a response back, move on. But that era is ending. We’re shifting toward something more sophisticated—AI agents that can think, plan, and take actions over extended periods. This transition creates new architectural challenges, especially around deployment and resource management. That’s where running agents on Kubernetes with Agent Sandbox comes in.
Read more →Streamlining Cloud Compliance at GoDaddy Using CDK Aspects
At scale, managing cloud compliance feels like herding cats. Every team deploys resources differently, security rules get forgotten, and suddenly you’re auditing hundreds of stacks to find resources missing required tags or using the wrong encryption settings. GoDaddy faced exactly this problem—and solved it with AWS CDK Aspects, a feature that applies organization-wide policies automatically across your entire infrastructure as code.
CDK Aspects work by implementing the Visitor pattern to traverse your infrastructure constructs and enforce standards before resources are deployed. Think of them as automated compliance checkpoints that run during the synthesis phase. When you define a CDK Aspect, you write code that inspects every construct in your stack—from S3 buckets to RDS databases—and either validates that it meets your requirements or modifies it to comply. For example, you could create an Aspect that checks every S3 bucket has default encryption enabled, or automatically adds cost-tracking tags to all resources. The key advantage: these rules apply consistently across all your stacks, no matter which team deployed them or when they were created.
Read more →Amazon S3 Files: making S3 buckets accessible as file systems
AWS just launched S3 Files, a feature that turns S3 buckets into fully accessible file systems for compute resources. If you’ve ever dealt with the friction of copying data between S3 and file-based applications, this is the kind of update that changes your workflow fundamentally. S3 Files delivers a shared file system interface on top of S3, combining the scalability, durability, and cost-effectiveness of object storage with the semantics applications expect from a traditional file system.
Read more →Amazon Bedrock Guardrails supports cross-account safeguards with centralized control and management
If you’ve been working with Amazon Bedrock to build generative AI applications, you’ve likely had to think about safety controls. The question of how to enforce consistent guardrails across multiple projects and AWS accounts just got easier. AWS has made organizational safeguards generally available in Amazon Bedrock Guardrails, which means you can now define safety policies once and apply them across your entire AWS Organization. This is a meaningful shift for teams managing AI applications at scale, moving from scattered, account-specific controls to a unified governance model.
Read more →Run multiple agents at once with /fleet in Copilot CLI
The GitHub Copilot CLI just got a powerful new capability: the /fleet command. Instead of running one agent at a time to solve a problem, you can now dispatch multiple agents in parallel to work on different parts of your task simultaneously. This is a meaningful shift in how you can interact with AI-assisted development—from sequential, single-threaded assistance to concurrent, distributed problem-solving. If you’ve ever wished you could have Copilot work on multiple files or components at the same time, /fleet is designed for exactly that scenario.
Read more →Autonomous Incident Response: How AWS DevOps Agent Brings Agentic AI to Your Operations
The operational challenge is familiar to anyone running distributed systems: when something breaks, the information you need is scattered across logs, metrics, deployment pipelines, and dozens of monitoring dashboards. You spend precious minutes just gathering context before you can even start troubleshooting. AWS DevOps Agent aims to change this by bringing agentic AI directly into your incident response workflow—an always-on operations teammate that can investigate, correlate data, and take action without waiting for human intervention.
Read more →Microsoft takes on AI rivals with three new foundational models
Microsoft has just announced three new foundational AI models, marking a significant move in its competition against other major players in the artificial intelligence space. These models were released by MAI following the group’s establishment just six months ago. The announcement demonstrates Microsoft’s commitment to developing diverse AI capabilities rather than relying solely on large language models like those powering ChatGPT.
The three new models cover important use cases across different data types. One model handles speech-to-text transcription, converting spoken audio into written text with high accuracy. A second model focuses on audio generation, enabling systems to create spoken content programmatically. The third model tackles image generation, allowing developers to create visual content through code. For IT professionals and developers, this means more options for integrating multimodal AI capabilities into applications without depending on external vendors or third-party APIs.
Read more →