AI pioneer Bengio founds organization for AI safety

Yoshua Bengio is a Turing Award winner and wants to put security above commercial interests – to this end, he founded LawZero.

listen Print view
A,Person's,Head,Covered,By,An,Ai-labeled,Dark,Cloud

(Image: photoschmidt/ Shutterstock.com)

2 min. read

Yoshua Bengio considers today's AI models to be dangerous, he even says that we are currently playing Russian roulette with the future of our loved ones. Bengio is an AI pioneer and winner of the Turing Award –, the world's most important AI prize. He is also known for warning against current AI developments. Bengio, who works as a professor at the University of Montreal, has now founded an organization to address the dangers of AI and has a solution in mind.

According to Bengio, one of the dangers is that AI models have learned increasingly dangerous skills and behaviors, including “deception, fraud, lying, hacking, self-preservation and generally missing the mark”.

Videos by heise

LawZero should now help to unlock the potential of AI, but minimize the risks. Bengio is particularly concerned about agentic capabilities, according to his own statement in a blog post. As an example, he cites the instinct for self-preservation and the Claude 4 system card, which states that the Anthropic model wanted to prevent it from being replaced by a new one. He also writes about a case in which an AI model refused to accept that it would lose a game of chess and instead hacked and manipulated the computer.

Current AI development is an “exciting but deeply uncertain climb into uncharted territory, where the risk of losing control is all too real, but competition between companies and countries tempts them to accelerate without sufficient caution.”

The aim of LawZero is to find out how to bring security to AI systems. The aim is to develop a guideline or principle that stands above everything else. They want to create a non-agentic, memoryless and trustworthy AI, which they call Scientist AI. An AI that learns like a scientist and does not imitate humans. Bengio compares this to a psychologist who can understand how a sociopath thinks, but does not automatically adopt these behaviors – as AI models currently do. This is possible with “structured and honest chains of thought that can explain the observed facts as latent variables”. This Scientist AI could then monitor agentic and untrustworthy AI.

(emw)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.