AGENTS.md: Helpful agent briefing or token hog?

An AGENTS.md is usually considered mandatory for using coding agents. However, a well-intentioned help can quickly become additional ballast.

listen Print view
Abstract representation of a neural network with connected circles.

(Image: Vanessa Bahr / iX)

9 min. read
By
  • Dr. Fabian Deitelhoff
Contents

The AGENTS.md file is a README for AI agents: a fixed location in the repository where build steps, test commands, tooling, architectural guidelines, and coding guidelines specifically for autonomous coding agents are described. The idea is that agents read this file early on, helping them understand faster how to execute tests, structure code, and which conventions to follow.

Providers like OpenAI, Anthropic, GitHub, and Qwen are actively promoting this pattern. Additionally, many frameworks come with command-line commands like /init that automatically generate an AGENTS.md or a similar file like CLAUDE.md from an existing repository. As a result, the standard has spread rapidly: by 2025, tens of thousands of public GitHub repositories were equipped with context files, and the trend is increasing. The AGENTS.md repository on GitHub lists the advantages and shows examples of how to structure such a file.

A team at ETH Zurich has examined the structure and usefulness of AGENTS.md. The study “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?“ systematically investigates for the first time the effect of such files on real agent workflows. To do this, the researchers combine two benchmarks: the established SWE-bench Lite with 300 tasks from eleven popular Python repositories and the benchmark tool AgentBench with 138 tasks from twelve less well-known repos, all of which contain real context files written by developers.

The study team tested the coding agents Claude Code with Sonnet 4.5, OpenAI Codex with GPT-5.2 and GPT-5.1 mini, and Qwen Code with Qwen3-30B-Coder, each in three variants: without a context file, with an automatically generated context file according to the respective agent developer's recommendation, and – on AgentBench – with the actually existing, developer-maintained context file. All agents loaded the file into their context, either AGENTS.md (for Codex and Qwen Code) or CLAUDE.md (Claude Code). The success rate was strictly measured by the team using test suites: a task is only considered solved if all associated tests pass after applying the agent patch.

The result is sobering: LLM-generated context files slightly reduce the success rate on average – by about 0.5 percentage points for SWE-bench Lite and around 2 to 3 percentage points for AgentBench, depending on the model. At the same time, inference costs increase by an average of 20 to 23 percent because the agents execute more steps and produce longer reasoning passages.

Even human-maintained context files perform only moderately better: they improve the success rate on AgentBench by about 4 percentage points on average compared to the scenario without any context file, but also increase the number of agent steps and thus the costs – by almost 20 percent in individual setups. To put it exaggeratedly: for a few percentage points of success gain, you pay with significantly more token consumption, longer runtimes, and more complex agent traces.

The study shows that agents take the instructions in context files seriously: if specific tools or workflows are mentioned, agents use them more often – for example, project scripts, pytest, uv, or repository-specific helper tools. Context files also lead to more tests, more file accesses, and more extensive repository navigation. So, the problem is not that the models ignore context instructions.

However, the additional activity makes the tasks more difficult: more instructions mean more things that the agent has to consider and weigh against each other, which is reflected in more reasoning tokens per task. At the same time, context files function poorly as a repository overview: on average, agents do not find the files relevant for a bug fix any faster than without AGENTS.md, even though many files explicitly describe directory structures, components, and entry points.

An important observation by the ETH team is that LLM-generated context files are usually redundant to existing documentation: READMEs, contributing guides, docs folders, and examples already contain build and test instructions, architectural overviews, and style guidelines that agents can also use via file access. In an ablation experiment, the researchers therefore removed all other documentation files from the repository, leaving only the generated context file. An ablation experiment (Ablation Study) is a method for evaluating AI models where specific components such as a feature, a layer, or modules are selectively removed or modified to measure their impact on overall performance.

In such a documentation-poor setting, the picture changes: suddenly, the generated context files improve the success rate of the agents by an average of about 2.7 percentage points, and in some cases even perform better than the original developer documents. The obvious interpretation: context files are helpful when they fill genuine knowledge gaps for the agents, not when they repeat already existing information in a slightly different form.

A separate empirical analysis of over 2,300 agent READMEs from nearly 2,000 repositories shows how developers use such files today (see study “Agent READMEs: An Empirical Study of Context Files for Agentic Coding“). Most frequently, they contain functional context: build and run commands (in over 60 percent of cases), implementation details (almost 70 percent), and architectural hints (around 68 percent).

In contrast, non-functional requirements such as security and performance are significantly underrepresented, being explicitly addressed in only about 15 percent of the files each. Furthermore, many files are long, difficult to read, and tend to evolve like configuration artifacts with many small additions rather than like clearly curated documents – another indication of why general-purpose context files quickly become a cognitive load for agents.

Videos by heise

From a practical perspective, this leads to several recommendations for the productive use of AGENTS.md (see GitHub blog post “How to write a great agents.md“):

  • Do not repeat what is already in the README and Docs. Avoid duplicate project descriptions or lengthy architectural excursions if they are already maintained elsewhere.
  • Focus on missing, hard-to-discover context. This includes project- or team-specific scripts, special test setups, non-obvious pitfalls, or domain-specific invariants that the agent would otherwise only learn through intensive trial and error.
  • Minimalist testable rules instead of a wish list. Each additional rule increases the search space for the agent. Few, clearly justified requirements are useful, such as “always run tests via make test-ci” instead of half a dozen alternative workflows.
  • Clearly tailor the agent role. GitHub reports from the analysis of over 2,500 AGENTS.md files that specialized roles – for example, a pure test agent or docs agent – work better than generic instructions.
  • Improve iteratively instead of perfecting everything in advance. Successful agent READMEs are created by teams observing typical agent error patterns and deriving targeted, concise correction instructions from them.

Furthermore, it can be helpful to let the agent itself optimize its own AGENTS.md. The LLM can analyze and improve an AGENTS.md by identifying and more precisely formulating unclear wording, contradictions, redundancies, or missing decision rules. The revised instructions later guide the agent's behavior. This is particularly useful when the result is not just rewriting the file, but also testing it with example tasks afterward to see if the agent actually follows the desired rules better. Therefore, an iterative process that analyzes, revises, and tests is most reliable, rather than relying solely on a more linguistically polished version.

For development teams, AGENTS.md is not a free productivity turbo, but a control instrument that requires them to make compromises. Automatically generated, highly redundant context files, based on current evidence, worsen success rates, make every agent run more expensive, and generate more complex traces that are harder to debug.

Repository context files are primarily helpful where they specifically provide missing information, for example, in poorly documented or special codebases, for niche toolchains, or clearly defined agent roles. The statement “README for AI agents” from the AGENTS.md website should therefore be taken literally: not as further complete documentation, but as a lean, precise operating manual that gives agents just enough context for robust results – and no token more.

(nb)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.