OpenRouter Hosts Perceptron Mk1 for Structured Video and Embodied Reasoning

OpenRouter

May 12, 2026 · Updated May 20, 2026

OpenRouter integrated Perceptron Mk1, a vision-language model designed for spatial grounding and video understanding with structured outputs like bounding boxes and timestamps. The model introduces a reasoning toggle for complex visual tasks at a significantly lower price point than general-purpose frontier models.

OpenRouter, a unified API for accessing hundreds of AI models, integrated Perceptron Mk1—a vision-language model from Perceptron Inc., a developer of foundation models for embodied intelligence. It processes video at 2 frames per second across a 32k context window, delivering structured spatial primitives like points and boxes as first-class outputs.

Context window: 32,768 tokens
Max output: 8,192 tokens
Video analysis rate: Up to 2 FPS
Pricing (input): $0.15 per million tokens
Pricing (output): $1.50 per million tokens
Spatial outputs: Points, boxes, polygons, and clips

This launch fills a gap between general-purpose chat models and specialized robotic perception layers. While frontier models often struggle with precise spatial localization, Perceptron Mk1 allows developers to request specific annotation formats. It mirrors the industry's shift toward Physical AGI development by prioritizing how models interpret the physical world.

Use Perceptron Mk1 for high-volume visual tasks like document parsing and video summarization via the OpenRouter API. A dedicated reasoning parameter can be enabled per request to trade latency for deeper analysis. Access is priced at $0.15 per million input tokens, offering a cost-effective alternative for multimodal reasoning workloads.

View the full update on openrouter.ai

OpenRouter

@OpenRouterMay 12

Perceptron Mk1 is live on OpenRouter, built by @perceptroninc. Frontier video and embodied reasoning in a vision-language model. Analyzes video at a dynamic frame rate (up to 2 FPS) across a 32k multimodal context, with hybrid reasoning and structured spatial primitives (points, boxes, polygons, clips) as first-class outputs.

327

View on X

Still wondering? A few quick answers below.

Perceptron Mk1 is a high-quality vision-language model specifically designed for video understanding and embodied reasoning. It processes both image and video inputs alongside natural language queries to produce detailed visual responses. The model excels at tasks like event detection, video summarization, and spatial grounding, making it a specialized tool for interpreting physical environments.

The model analyzes video at a dynamic frame rate of up to 2 frames per second. It operates within a 32,768 token multimodal context window, allowing it to maintain a large amount of visual information for temporal reasoning. This capability enables the model to perform complex tasks like video question answering and identifying specific events across a timeline.

Perceptron Mk1 can generate structured spatial primitives as first-class outputs when requested via the annotation format parameter. These include points, bounding boxes, and polygons for image localization, as well as clips with start and end timestamps for video segments. Without these specific requests, the model defaults to providing responses in natural language text.

Accessing Perceptron Mk1 through the OpenRouter API costs $0.15 per million input tokens and $1.50 per million output tokens. This pricing structure is designed to offer frontier-level video and embodied reasoning capabilities at a fraction of the cost of other high-end models, making it accessible for high-volume production workflows and agentic tasks.

Users can enable a reasoning parameter on a per-request basis to trigger deeper analysis for more difficult tasks. This hybrid reasoning mode allows the model to perform internal chain-of-thought processing before delivering a final answer. While this increases latency, it improves the model's performance on complex visual understanding and spatial reasoning challenges.

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from OpenRouter →

Keep reading

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouter integrated NVIDIA's Nemotron 3 Nano Omni, a 30B-A3B model that natively processes text, audio, and video in a single inference loop. By using a hybrid Transformer-Mamba architecture, the model reduces the compute cost of video reasoning by 2.5x, making it a high-efficiency perception layer for autonomous agents.

What is Perceptron Mk1?

How does Perceptron Mk1 handle video analysis?

What are the structured outputs in Perceptron Mk1?

What is the pricing for Perceptron Mk1 on OpenRouter?

How does the reasoning parameter work in Perceptron Mk1?

Keep reading

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

Keep reading

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning

OpenRouter Hosts NVIDIA Nemotron 3 Nano Omni for Free Multimodal Reasoning