Red Hat: AI 3.4 with Agent Control and Extended Inference
Red Hat's new version of its AI platform offers Model-as-a-Service, extensive agent management, and tighter Nvidia integration.
(Image: tomeqs / Shutterstock.com)
- Harald Weiss
At its annual Summit in Atlanta, Red Hat introduced an enhanced version of its AI platform. A particular focus of the new Release 3.4 lies on extended inference capabilities: With Model-as-a-Service and an AI gateway, platform teams can deploy, secure, and measure the usage of models more easily. This means companies can offer internal inference services themselves rather than solely consuming external model APIs. The technical foundation is the AI Inference Platform, based on vLLM, an open-source server for AI inference.
With AI 3.4, the company also expands support for underlying systems and components. This includes GPU acceleration for Nvidia and AMD, as well as CPU-based infrastructures for smaller language models. For operating larger inference environments, Red Hat is expanding the llm-d framework. It adds functions like request prioritization and batch inference. Speculative decoding is intended to accelerate response generation, thereby helping to reduce inference costs.
Videos by heise
“Many companies want to move away from token consumption and towards operating their inference platform,” says Joe Fernandes, Vice President AI Business at Red Hat. “Especially with larger loads or sovereign environments, operating own inference services can be economically and regulatorily sensible.”
Second focus: AI Agents
The new version includes features for identity, authorization, and lifecycle management of AI agents. It also adds tracing and observability to make agent activities more transparent. A curated MCP server catalog (Model Context Protocol) and an MCP gateway are intended to facilitate the controlled connection of tools, services, and data sources. Also new is an Evaluation Hub as a common control layer for evaluation frameworks, experiment tracking, AutoRAG, and AutoML. For prompts, Red Hat is launching integrated prompt management with Prompt Lab and Registry. MLflow will be generally available with AI 3.4 and serves, among other things, as the basis for prompt management, evaluations, and agent tracing.
On the topic of security, Red Hat points to its automated Red Teaming. This feature, based on the acquisition of Chatterbox Labs, is designed to automatically test models and agents for risks before they are deployed in production. With this, Red Hat addresses one of the central weaknesses of agentic AI: agents are only useful if their access, behavior, and results are also verifiable.
Intensive Cooperation with Nvidia
In parallel, Red Hat is significantly expanding its collaboration with Nvidia. Red Hat AI Factory with Nvidia combines Red Hat AI Enterprise with Nvidia AI Enterprise and is intended to support companies in building productive AI infrastructures. Red Hat points to support for Nvidia's Blackwell generation as well as day-zero support for the upcoming Vera Rubin architecture. Red Hat is also participating in OpenShell, an Nvidia project for secure execution environments and sandbox functions for AI agents. According to Red Hat, several partners are involved in the AI Factory, including Cisco, Dell Technologies, Lenovo, Supermicro, TD SYNNEX, and WWT. Customers are to receive validated systems comprising hardware, software, and services for productive AI environments.
Conclusion: Red Hat is clearly focusing on the operational side of AI. The platform is less aimed at pre-training large foundation models than at inference, model customization, RAG, agent operation, and governance in hybrid environments. Fernandes comments: “Training large generative models is likely not the central use case in the classic enterprise IT market.” For him, the controlled deployment, optimization, and management of existing models and agents are more important. This aligns with Red Hat's traditional strengths: infrastructure, OpenShift, Kubernetes, hybrid cloud, and open interfaces.
(wpl)