Databricks presents Genie Code: An AI agent to take over work for data teams

The AI agent Genie Code is intended to autonomously handle complex tasks in data engineering and analytics – from pipeline creation to production monitoring.

listen Print view
Generative AI: Databricks serves ML models directly from the data lake

(Image: Phonlamai Photo / Shutterstock.com)

4 min. read
Contents

Databricks has introduced Genie Code, an AI agent that is set to fundamentally change the work of data teams. Instead of merely assisting developers in writing code, the agent is said to independently take on complex tasks: building data pipelines, troubleshooting production systems, creating dashboards, and maintaining ongoing systems. According to Ali Ghodsi, co-founder and CEO of Databricks, Genie Code points the way towards "agent-based data work."

More on the topic: data2day 2026 – CfP opened
data2day 2026 logo

On October 7 and 8, 2026, data2day will invite Data Scientists, Data Engineers, and Data Teams to the 13th edition of the conference. Until April 15, experts can still submit their proposals for talks and workshops in the Call for Proposals.

According to the announcement in the Databricks blog, Genie Code complements the existing Genie product family, which already allows users to access their company data via a chat interface. Compared to conventional coding agents, Genie Code is said to distinguish itself primarily through its deep integration into the company's own data infrastructure. Via Databricks' Unity Catalog, the agent accesses metadata, data lineage, usage patterns, and governance policies. Conventional coding agents often fail with data tasks because they lack precisely this context, according to Databricks.

Videos by heise

Genie Code is not a single language model, but an agent-based system that distributes tasks across multiple models and tools. Depending on the requirement, the system is said to automatically select the appropriate model – whether a proprietary Frontier model, an open-source model, or a custom model hosted on Databricks.

The functions extend across the entire data and ML lifecycle: the agent is intended to be able to handle complete machine learning workflows – from feature engineering, training, and comparison of multiple model types to deployment on Databricks Model Serving. Experiments are logged in MLflow. In the area of data engineering, Genie Code creates production-ready Spark pipelines according to the manufacturer, considers differences between staging and production environments, and automatically applies data quality checks. Furthermore, the agent is intended to generate dashboards with reusable semantic definitions and autonomously plan and execute multi-stage tasks.

Genie Code is intended to create visualizations, configure filters, and organize dashboard layouts – with reusable semantic definitions.

(Image: Databricks)

Another aspect is proactive monitoring: Genie Code is intended to monitor Lakeflow pipelines and AI models in the background, triage errors, and investigate anomalies before a human needs to intervene. However, so-called "Background Agents" that permanently handle this monitoring in the background are not yet available, according to Databricks – but are expected to be rolled out soon.

The agent has persistent storage that automatically updates internal instructions based on past interactions and coding preferences. This is intended to make it "better" over time.

Parallel to the introduction of Genie Code, Databricks announced the acquisition of Quotient AI. The company specializes in the evaluation and reinforcement learning for AI agents and was previously involved in improving the quality of GitHub Copilot. Through the integration, continuous performance monitoring is to be embedded directly into Genie Code: According to Databricks, Quotient measures response quality, detects regressions early, and identifies errors – and feeds these findings into an improvement process.

According to Databricks, Genie Code is generally available immediately and directly integrated into Databricks workspaces – in notebooks, the SQL editor, and the Lakeflow Pipelines editor. An elaborate configuration is not required.

The agent can be extended in three ways: Via the Model Context Protocol (MCP), Genie Code can interact with external tools such as Jira, Confluence, or GitHub. So-called Agent Skills allow domain-specific capabilities to be defined, for example, for handling personal data or company-specific validation frameworks. And via persistent storage, the agent learns from past interactions and adapts to the workflow of the respective team.

Databricks' new functions align with an industry-wide trend. Almost all major providers are now relying on agent-based AI systems that are intended to solve complex tasks autonomously. However, the extent of their actual capabilities is controversial – especially with regard to non-functional requirements.

(map)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.