Introducing MMX-CLI — our first piece of infrastructure built not for humans, but for Agents. Your Agent can read, think, and write. But ask it to sing, paint, or show you a world it's never seen — and it falls silent. Not because it doesn't understand, but because it has no mouth, no hands, no camera. Today, that changes. MMX-CLI gives every Agent seven new senses — image, video, voice, music, vision, search, conversation — powered by MiniMax's full-modal stack, today's SOTA across mainstream omni-modal models. One command: mmxAgent-native I/O. Zero MCP glue. Runs on your existing Token Plan. Two lines to give your Agent a voice: npx skills add MiniMax-AI/cli -y -g npm install -g mmx-cli Then tell it: "you have mmx commands available." It'll learn the rest. Github → https://t.co/fSRc5Lo30j Token Plan: https://t.co/BDCycxepZw
MiniMax Launches MMX-CLI to Give AI Agents Native Multimodal Senses
MiniMax· Updated
MiniMax released MMX-CLI, a command-line interface that provides AI agents with native capabilities for image, video, music, and voice generation. By treating these multimodal outputs as terminal commands, agents can now autonomously create and perceive media without complex API integrations or middleware.
mmx music generate as easily as it writes text.Most agents are limited to text-based reasoning, requiring custom code to interact with multimodal APIs. This release removes that friction by offering agent-native I/O that requires no MCP (Model Context Protocol, a standard for connecting agents to tools) setup. It gives autonomous systems the mouth and eyes needed for complex media workflows.
Integrate these capabilities into agentic tools like Claude Code by running npx skills add MiniMax-AI/cli. The CLI supports asynchronous video generation, streaming speech synthesis, and music cover generation. Usage is billed through MiniMax Token Plans, and the project is available as an open-source repository on GitHub.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →


