ASI:Cloud Adds MiniMax M2.5, Qwen, and GLM Models for Inference

CUDOS

Mar 19, 2026 · Updated Apr 25, 2026

ASI:Cloud, a serverless AI inference platform, added three new models for immediate use: MiniMax M2.5, Qwen 3.5-35B-A3B, and GLM 4.7 Flash. No waitlists — all three are live now via its OpenAI-compatible API.

ASI:Cloud, a serverless inference platform operated by the ASI Alliance (SingularityNET and CUDOS), added three open-source models to its inference catalog: MiniMax M2.5 (229B params, $0.26/1K input tokens), Qwen 3.5-35B-A3B (a Mixture-of-Experts model the team describes as Sonnet-level performance), and GLM 4.7 Flash (ultra-fast). All three are live immediately with no waitlist.

ASI:Cloud positions itself as permissionless inference — no waitlists, no gating — with pay-per-token pricing on enterprise-grade NVIDIA GPUs. The platform also offers a path from serverless inference to dedicated API endpoints as usage scales.

All three models are immediately accessible via ASI:Cloud's OpenAI-compatible API — no new integration setup needed to add any of them to an existing pipeline.

View the full update on asicloud.cudos.org

CUDOS

@CUDOS_Mar 18

🚀 New models just dropped on ASI:Cloud Now live for inference: • MiniMax M2.5 - crushing benchmarks • Qwen 3.5-35B-A3B - Sonnet-level performance, efficient MoE • GLM 4.7 Flash - Ultra-fast Permissionless AI inference. No waitlists. Try them now 👇 https://t.co/L0LuDk8BBv https://t.co/MZQdTWlQGx

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Keep reading

Qwen 3.5 Series Releases GPTQ-Int4 Weights for Limited-GPU Inference

Qwen 3.5 Series, Alibaba's open-weight model family, now has GPTQ-Int4 quantized weights available for its larger model sizes. They work natively with vLLM and SGLang, cutting VRAM needs so teams can run larger Qwen 3.5 models on constrained GPU setups.

RunwareMar 20

Runware Adds MiniMax M2.7 on Day Zero of Release

Runware, an AI model API, now hosts MiniMax M2.7 — a long-context LLM built for agentic coding, tool use, and office productivity. The model scores 56.22% on SWE-Pro and a 97% skill adherence rate across 40+ complex tasks.

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

OllamaJun 7

Ollama Cloud Adds MiniMax M3 for Frontier Agentic Coding and 1M Context

Ollama has made the MiniMax M3 model available on its Cloud, providing US-based access with zero data retention. This integration offers a frontier-level, open-weight model for agentic coding and multimodal tasks, featuring a 1-million-token context window. It expands access to advanced AI capabilities for complex, autonomous workflows.

Cloudflare adds MiniMax M3 with 1M context for agentic coding

MiniMaxJun 2

Cloudflare adds MiniMax M3 with 1M context for agentic coding

Cloudflare has integrated the MiniMax M3 foundation model into its AI Gateway platform. The update provides developers with a high-context, multimodal model specialized for autonomous coding tasks directly within their existing infrastructure.