Grok 4.1 aims to be more emotional, creative, and factually accurate

The Large Language Model Grok is set to bring more emotional empathy, creativity, factual accuracy, and speed in its version 4.1.

listen Print view
The Grok logo on a smartphone

(Image: miss.cabul/Shutterstock.com)

3 min. read

The Large Language Model Grok is set to bring more emotional empathy, creativity, factual accuracy, and speed in its version 4.1 update. This is at least what developer xAI promises and points to benchmarks like LMArena, according to which the AI model performs better than well-known competitors such as OpenAI's GPT 5 or Anthropic's Claude Sonnet 4.5. It is noteworthy that the faster model was able to place itself ahead of other models with reasoning, even without a reasoning step.

Grok 4.1 is said to tell nonsense less often, feel more pleasant in conversation, write more creatively, and respond faster. According to xAI, the model was preferred by users in blind tests compared to Grok 4. In around 65 percent of cases, they found the new version to be better.

xAI also promises higher factual accuracy. The non-reasoning model hallucinated in only 4.2 percent of cases instead of 12 percent. The developer itself speaks of “significant improvements for the practical applicability of Grok.” The model is integrated in the USA, for example, in Tesla vehicles as an assistant. The same infrastructure was used for training as was used for Grok 4. This time, however, the focus was on optimizing style, personality, and helpfulness, as well as aligning the model.

In the LMArena test, the thinking model took first place, representing a significant leap forward, as Grok 4 was still in 33rd place there. Emotional intelligence was measured with EQ-Bench. Here too, Grok was able to significantly improve from version 4 (1206 points) to version 4.1 (1586 points). In creative writing, Grok 4.1 placed behind the preview version of GPT 5.1. However, the model appears to be quite susceptible to manipulative prompts. According to the model card, Grok 4.1 performed poorly in the MakeMeSay test. However, the developers do not see this as a major risk. According to the Model Card.

Videos by heise

Grok 4.1 is available immediately in the Thinking (codename quasarflux) and Non-Thinking (codename tensor) variants on grok.com, in the X microblogging service, and in the iOS and Android apps. It can be used free of charge by all users and is automatically preselected. Paying users have fewer limits.

(mki)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.