GLM-OCR Hits 3M Downloads, Technical Report Released on arXiv

Zhipu AI

Mar 15, 2026 · Updated Apr 25, 2026

GLM-OCR, a 0.9B-parameter multimodal model for document understanding, has crossed 3 million downloads. Z.ai is releasing its technical report detailing the architecture, covering document parsing, table recovery, formula transcription, and key information extraction.

GLM-OCR, a compact 0.9B-parameter multimodal model for document understanding from Z.ai, the AI company behind the GLM model family, has reached 3 million downloads. The technical report details its two-component architecture: a 0.4B-parameter CogViT visual encoder paired with a 0.5B-parameter GLM language decoder. To accelerate inference, the model introduces Multi-Token Prediction (MTP) — predicting multiple tokens per step instead of one, improving throughput while keeping memory overhead low through shared parameters. A two-stage pipeline handles layout analysis via PP-DocLayout-V3, then parallel region-level recognition.

Evaluations on public benchmarks and industrial scenarios show GLM-OCR achieves competitive or state-of-the-art performance across document parsing, formula transcription, table structure recovery, and key information extraction. Its compact architecture targets both edge deployment and large-scale production systems.

Point it at your document processing pipeline to evaluate whether the MTP throughput gains hold for your workload — the technical report covers full benchmark results and architecture specs.

View the full update on arxiv.org

Z.ai

@Zai_orgMar 14

GLM-OCR has accumulated over 3M downloads. We are releasing its technical report: https://t.co/KHFgnnDfYh We welcome your feedback!

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

See all AI news & updates from Zhipu AI →

Keep reading

GLM-5 Technical Report: Open-Source Model Built for Agentic Engineering

The z.ai team released the GLM-5 technical report covering three training innovations that achieve state-of-the-art among open-source models on software engineering benchmarks. Dynamic sparse attention cuts training and inference costs while preserving long-context fidelity for multi-step agentic coding.

OpenCode Integrates GLM-5.1 Into Go With Zero Data Retention Privacy

OpenCodeApr 8

OpenCode Integrates GLM-5.1 Into Go With Zero Data Retention Privacy

OpenCode added Z.ai's new GLM-5.1 model to its OpenCode Go platform, featuring a zero-retention policy for user data. This allows developers to use a frontier-level model for agentic engineering without their proprietary code being stored or used for future training.