Microsoft releases evaluation tool for Copilot agents
The Microsoft 365 Copilot Agent Evaluations CLI tool allows Copilot agents to be systematically tested and improved. Currently, it is free of charge.
(Image: IB Photography / Shutterstock.com)
- Manuel Masiero
Microsoft has introduced the Microsoft 365 Copilot Agent Evaluations CLI. The command-line tool, available as a free preview since May 8, enables users to test and improve the quality of AI agents. To achieve this, the Agent Evaluations CLI sends questions to an agent and evaluates its answers using Azure OpenAI models.
The Agent Evaluations CLI is part of the Microsoft 365 Copilot Extensibility Platform, a central Microsoft platform for managing AI agents. The Evaluations CLI is available there via the Admin Center and functions as a standalone developer tool for quality measurement.
During a test, the CLI tool sends prompts to an agent provided within Microsoft 365. It supports three input types with JSON datasets, interactive inputs, and inline prompts such as --prompts "Question 1" "Question 2", allowing it to map structured tests as well as live dialogues. The evaluation function can also be used for Vibe Coding.
Videos by heise
Checklist for Agent Evaluation
The CLI evaluates answers provided by the agent based on seven metrics. The evaluation includes, among other things, how well context is understood in single or multiple dialogues and how well the agent can process follow-up questions. It also tests whether the agent performs end-to-end tasks as if it were in a real user dialogue.
(Image:Â Microsoft)
Developers can use the test reports in HTML, JSON, or CSV format in their development cycles, code reviews, or CI/CD pipelines. In the long term, such systematic and repeatable evaluations are intended to become a standard component in software development with Microsoft 365 Copilot, as Microsoft writes in its developer blog.
During the test phase, the duration of which Microsoft does not specify, programmers can use the Microsoft 365 Copilot Agent Evaluations CLI free of charge. To do so, they need a Microsoft 365 Copilot license, Node.js 24.12.0 or higher, an agent provided in the tenant along with administrator consent to execute it there, and an Azure OpenAI endpoint for LLM evaluations (default is gpt-4o-mini). Currently, the tool exclusively supports Windows development environments; support for macOS and Linux is announced.
(mro)