Today we're announcing that hybrid agentic inference is coming to Perplexity Computer. Computer can split tasks between a local model running on your machine and frontier models in the cloud. This keeps private data on your device and maximizes token efficiency. Coming soon. https://t.co/6t3PrmI1FX
Perplexity Computer announces hybrid inference to balance local privacy and cloud power
- Availability
- July 2026
- Local Hardware Support
- Intel and NVIDIA RTX Spark
- Orchestration Logic
- Automatic task-by-task routing
- Primary Benefit
- Local privacy for sensitive data
- Cloud Integration
- Frontier models in the cloud
This shift addresses efficiency by reserving expensive server-side inference for work that genuinely requires it. By orchestrating compute location, Perplexity reduces dependency on centralized infrastructure. It positions user hardware as a private data center, mirroring on-device agent efforts like Google's Gemma 4 to balance privacy with frontier-level performance.
Coming in July 2026, the hybrid system will run across local silicon, including Intel chips and NVIDIA RTX Spark hardware. Users will not need to manually toggle modes; the orchestrator handles routing automatically based on task complexity and data sensitivity. This model-agnostic harness ensures capable models are used only when necessary.
Still wondering? A few quick answers below.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →
