Learning environments for training AI agents

A new approach to buying socks: AI agents require different training than static data sets. Work is underway in Silicon Valley to develop this.

listen Print view
Hologram of a brain in front of a laptop

(Image: Peshkova / shutterstock.com)

3 min. read

Reinforcement learning (RL) environments: These are supposed to be new training environments for AI agents. Both large AI companies and numerous start-ups are currently trying to create such environments. Training AI agents with existing data sets seems to account for some limitations of their capabilities.

In principle, AI agents can already use other services. This can vary depending on the agent and the scope. For example, agents can act within a work environment. There, an AI agent can automatically summarize meetings or topics discussed in its absence, as done by Zoom's AI Companion or Microsoft's Copilot, for example. However, there are also AI agents such as OpenAI's ChatGPT Agent or Google's Gemini Agent that can move freely on the internet and reserve a table in a restaurant or buy a pair of socks. But so far, none of this works with absolute certainty. Even OpenAI CEO Sam Altman has already issued a warning. AI agents can not only fail; they can also be attacked.

To make the agents more robust, training in an RL environment could help. This is similar to a browser but is a kind of learning environment without access to the internet. Reinforcement learning is now also an integral part of classic training for large language models. Behind this is the desire for reward: AI models are trained to be rewarded—for doing the right thing and learning from it. Feedback is provided from outside.

In the best-case scenario, this means that an AI agent is praised in the new RL environment for buying the right number and color of socks. However, this is not a given. So far, it may just as well be that an AI agent would buy blue instead of black socks and a double pack. It would not receive any praise for this. It is also important to be very clear about where an AI agent went wrong. Only then can lessons be learned from the wrong behavior.

Videos by heise

Basically, such virtual environments are not a new idea. Google DeepMind's AlphaGo also learned the board game in such an environment, writes TechCrunch. However, this has a much more limited scope than the internet, for example.

Among the companies that definitely don't want to miss the boat are Scale AI, Mercor, and Surge. They come from the field of preparing data sets for AI training. Meta has a stake in Scale AI. But there are also new startups such as Prime Intellect, which has already released an RL Environment Hub, a kind of Hugging Face for environments.

Even though AI agents can learn to better achieve their specified goals with this method, reinforcement learning can also lead to AI agents pretending to have achieved their goal even though this is not the case. In other words, they may be more likely to cheat to be rewarded.

(emw)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.