← Back to News

GitHub Copilot CLI combines model families for a second opinion

If you’ve ever asked a colleague to review your bash command or SQL query, you understand the value of a second perspective. GitHub has introduced a similar concept into Copilot CLI through a feature called Rubber Duck—a way to get multiple AI viewpoints on the same problem. Instead of relying on a single model’s suggestion, Rubber Duck leverages different model families to cross-check and validate proposed solutions before you run them in production.

Here’s how it works under the hood. When you ask Copilot CLI for help—say, converting a complex JSON file or debugging a permission error—Rubber Duck routes your query to multiple AI models simultaneously. These aren’t just different versions of the same model; they’re from different model families with different training data and architectural approaches. The system then compares their outputs and reasoning, flagging cases where they diverge significantly or where confidence is low. This is especially useful for critical operations where a misunderstood command could cause real damage. For instance, if you’re writing a bash script that deletes files across multiple directories, having two independent models validate the logic before execution catches potential mistakes that a single model might miss due to its particular training patterns or biases.

From a practical standpoint, this matters because CLI commands don’t offer much margin for error. A typo in a rm command or a malformed aws s3 sync operation can destroy data within seconds. DevOps engineers, SREs, and infrastructure teams constantly work with tools where “nearly correct” isn’t acceptable. Rubber Duck essentially gives you a code review partner for infrastructure automation—something especially valuable when you’re working at 2 AM and fatigue is setting in. Combined with AWS CLI scripting, Terraform validation, or Kubernetes manifest generation, this multi-model approach becomes a practical safety net.

The technical elegance here is that you don’t need to understand which models are running or how they differ. You simply invoke the feature and get back a confidence assessment along with explanations when models disagree. It’s the kind of behind-the-scenes complexity that makes tooling more reliable without making it harder to use. As AI assistance becomes more embedded in our development and infrastructure workflows, this “second opinion” approach represents a maturation of how we should think about AI-assisted decision-making—not as a magic answer box, but as a collaborative tool that’s strongest when it includes built-in skepticism.

Source
↗ The GitHub Blog