HeadsUpAI

OpenRouter Hosts Perceptron Mk1 for Structured Video and Embodied Reasoning

OpenRouter, a unified API for accessing hundreds of AI models, integrated Perceptron Mk1—a vision-language model from Perceptron Inc., a developer of foundation models for embodied intelligence. It processes video at 2 frames per second across a 32k context window, delivering structured spatial primitives like points and boxes as first-class outputs.
Context window
32,768 tokens
Max output
8,192 tokens
Video analysis rate
Up to 2 FPS
Pricing (input)
$0.15 per million tokens
Pricing (output)
$1.50 per million tokens
Spatial outputs
Points, boxes, polygons, and clips

This launch fills a gap between general-purpose chat models and specialized robotic perception layers. While frontier models often struggle with precise spatial localization, Perceptron Mk1 allows developers to request specific annotation formats. It mirrors the industry's shift toward Physical AGI development by prioritizing how models interpret the physical world.

Use Perceptron Mk1 for high-volume visual tasks like document parsing and video summarization via the OpenRouter API. A dedicated reasoning parameter can be enabled per request to trade latency for deeper analysis. Access is priced at $0.15 per million input tokens, offering a cost-effective alternative for multimodal reasoning workloads.

OpenRouter
OpenRouter
@OpenRouter
X

Perceptron Mk1 is live on OpenRouter, built by @perceptroninc. Frontier video and embodied reasoning in a vision-language model. Analyzes video at a dynamic frame rate (up to 2 FPS) across a 32k multimodal context, with hybrid reasoning and structured spatial primitives (points, boxes, polygons, clips) as first-class outputs.

3retweets27likes
View on X

Still wondering? A few quick answers below.

Perceptron Mk1 is a high-quality vision-language model specifically designed for video understanding and embodied reasoning. It processes both image and video inputs alongside natural language queries to produce detailed visual responses. The model excels at tasks like event detection, video summarization, and spatial grounding, making it a specialized tool for interpreting physical environments.

The model analyzes video at a dynamic frame rate of up to 2 frames per second. It operates within a 32,768 token multimodal context window, allowing it to maintain a large amount of visual information for temporal reasoning. This capability enables the model to perform complex tasks like video question answering and identifying specific events across a timeline.

Perceptron Mk1 can generate structured spatial primitives as first-class outputs when requested via the annotation format parameter. These include points, bounding boxes, and polygons for image localization, as well as clips with start and end timestamps for video segments. Without these specific requests, the model defaults to providing responses in natural language text.

Accessing Perceptron Mk1 through the OpenRouter API costs $0.15 per million input tokens and $1.50 per million output tokens. This pricing structure is designed to offer frontier-level video and embodied reasoning capabilities at a fraction of the cost of other high-end models, making it accessible for high-volume production workflows and agentic tasks.

Users can enable a reasoning parameter on a per-request basis to trigger deeper analysis for more difficult tasks. This hybrid reasoning mode allows the model to perform internal chain-of-thought processing before delivering a final answer. While this increases latency, it improves the model's performance on complex visual understanding and spatial reasoning challenges.

Share this update