Anthropic revises behavioral guidelines for AI model Claude

Anthropic has revised the behavioral guidelines for Claude. The "Constitution" now also explains to the models why the specifications are made.

listen Print view
The Anthropic logo

(Image: PhotoGranary02 / Shutterstock.com)

2 min. read

The AI development company Anthropic has fundamentally revised the behavioral guidelines according to which the large language model (LLM) Claude operates. The so-called “Constitution” is no longer just a list of instructions for the first time, it says in an accompanying blog post, but now also tries to convey to the models why these specifications are made to the AI.

Anthropic hopes that by understanding the rules, the models will be able to transfer their meaning and intent to scenarios that were not explicitly listed in the previous policy document. At the same time, the move reflects that the AI developer trusts its models with human-like capabilities for the future. The guidelines are apparently also intended to ensure that if the AI ever gains consciousness, it does not prevent a human-willed shutdown. However, elsewhere it is stated that human address is primarily intended to show the model that human qualities are desirable.

However, the “Constitution” also sets clear limits for the models, where the developers apparently do not trust the AI to recognize them on its own. These include assistance in building weapons of mass destruction or support for genocide, as well as any involvement in the creation of child pornography. Anthropic has instructed its model to prioritize safety over ethics in case of doubt. This means that, for example, undermining human control should not occur even if a model would recognize this as ethically correct.

Videos by heise

Anthropic has published its guidelines under the “Creative Commons CC0 1.0” license. This makes them free and usable for any purpose without permission. The new specifications are also used in various training phases and in the generation of synthetic training data.

Whether the AI actually exhibits the desired behavior or deviates from it through the new approach is to be documented in so-called System Cards, which have already investigated the risks and weaknesses of the models in the past. Anthropic emphasizes that external experts from the fields of law, philosophy, theology, and psychology were also involved in the development. The new guidelines are in use with all current Claude models.

(mki)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.