Google Cloud hits the brakes on AI costs

Google Cloud introduces automated spend caps and a FinOps Explainability Agent to better control and analyze AI costs.

listen Print view
Euro banknotes in denominations of 50, 100, and 200 float in clouds against a blue sky.

(Image: heise medien)

4 min. read
Contents

Google Cloud is expanding its FinOps portfolio with new features for AI workloads. The focus is on automated spend caps that actively enforce budget limits, as well as a new FinOps Explainability Agent that independently analyzes cost drivers. With these, Google aims to improve control over difficult-to-calculate AI costs and reduce the effort required for their analysis.

The background is that AI workloads are changing cost structures in the cloud. Instead of relatively stable load profiles, highly fluctuating costs arise – for example, due to variable token usage, different model prices, or the use of specialized hardware such as GPUs and TPUs. While classic FinOps tools provide reports and warnings here, they do not intervene directly in ongoing operations.

These gaps are to be closed by the new Spend Caps, which Google is initially offering in a private preview. Administrators can use them to set budgets at the project level, which the system automatically enforces. If a project reaches the limit, Google Cloud first issues a warning and then pauses API traffic. The underlying resources remain intact. Those who want to continue operations can adjust or lift the Spend Cap. Initially supported are Google AI Studio, the Gemini Enterprise Agent Platform as a further development of Vertex AI, Cloud Run, Cloud Run Functions, and the Maps APIs.

The benefit is particularly evident in experimental AI workloads. A faulty prompt loop or an unoptimized inference pipeline can cause millions of API calls and correspondingly high costs within a short time. Spend Caps intervene automatically in such cases, without human intervention being required.

In addition, Google is introducing the FinOps Explainability Agent, which is directly integrated into the billing system. The agent independently analyzes which factors are driving the costs of AI workloads and provides evaluations on demand. Users can, for example, ask how costs are distributed between Gemini 1.5 Pro and Gemini 1.5 Flash, which API keys are particularly expensive, or what the share of input and output tokens is in the total costs.

Such evaluations are necessary because although AI costs can be formally described as the product of quantity and price, the influencing factors are highly fragmented. In addition to request volume, token numbers, error rates, memory accesses, and model changes play a role. The Explainability Agent automatically correlates these factors and is intended to accelerate root cause analysis – for example, in the event of unexpected cost increases or for evaluating the return on investment of individual AI projects.

Videos by heise

In addition, Google announces extended billing hierarchies and reporting for contract commitments. The new hierarchies are intended to consolidate spending across multiple billing accounts, including so-called Other Eligible Services – i.e., additional product families that Google considers in enterprise contracts alongside the actual cloud services, such as Apigee, AppSheet, Looker, Workspace products, Mandiant, or VirusTotal. The Commitment Reporting, also announced initially as a private preview, is also intended to show how quickly customers are consuming their commitments within an enterprise agreement.

According to Google's announcement, the FinOps Explainability Agent is already available in the Cloud Console. Spend Caps and the extended billing and reporting functions are initially only available in a private preview for which customers can register.

(fo)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.