Perplexity automatically distributes AI computing power between device and cloud

Perplexity announces a hybrid inference orchestrator that automatically splits AI tasks between the local device and the cloud.

(Image: PixieMe/shutterstock.com)

Jun 4, 2026 at 3:45 pm CEST

3 min. read

By

Carolin Riethmüller

Perplexity has announced a hybrid approach to AI inference that automatically splits tasks between the local computer and cloud servers. The so-called “Personal Computer,” Perplexity's version of personal desktop agents, is intended to keep sensitive data on the device and outsource computationally intensive work to the cloud – without users having to decide in advance where something is processed.

Perplexity describes the new service as a compact AI model that runs locally on the device and decides which parts of a request remain there and which should go to a more powerful frontier model in the cloud. The company cites handling financial documents, health information, and personal files as typical use cases – data that, for data protection reasons, should ideally not leave the device.

Existing concept, new level

Perplexity's hybrid approach is not entirely new; other providers have similar approaches. Microsoft, for example, is also pursuing a hybrid course with Copilot+ PCs and local NPU functions, even though many Copilot functions still require a cloud connection.

According to VentureBeat, the essential difference probably lies in the claim to perform the split fully automatically and task by task, sometimes even while the task is running. Other providers have not yet reached the level that Perplexity has demonstrated at Computex.

Starting in July, Personal Computer with local inference will be available and help to reduce the currently typical conflicts of interest between three factors: Accuracy and complex tasks require the most powerful, computationally intensive models; data protection demands local processing; and costs require an efficient mix of powerful and inexpensive models – depending on the task. The orchestration between these requirements is the actual problem. This is precisely what the hybrid approach aims to solve.

Support for Intel and Nvidia

Perplexity presented the hybrid orchestrator together with Intel. However, the model-agnostic orchestration framework is also intended to run on other local hardware, including Nvidia's RTX Spark. Perplexity has not yet provided concrete minimum hardware requirements – for example, regarding the necessary NPU or GPU performance. The computer manufacturer HP has decided, for example, for Microsoft's hybrid model Copilot+ PC that laptops for the Copilot+ PC label require a dedicated Neural Processing Unit (NPU) of at least 40 TOPS.

Videos by heise

Similarly, Perplexity still lacks technical details on the routing rules: how exactly the local model decides which data is considered sensitive and which metadata could still be transmitted to Perplexity servers remains open.

The robustness of the data protection promise in everyday use can also only be assessed once Perplexity publishes technical documentation on model sizes, storage requirements, and the handling of telemetry data.