HeadsUpAI

Fireworks AI Adds Safe Tokenization to Stop Users Overriding System Prompts

Fireworks AI, an inference platform for fast model serving, launched an opt-in safe_tokenization flag to prevent prompt injection (a vulnerability where malicious input overrides model instructions). The feature ensures user-provided strings are encoded as harmless subwords rather than structural control tokens that define turn boundaries.

Most open-weights models rely on standard tokenization pipelines that merge system prompts and user text into a single string, creating a security risk. This update follows the platform's expansion of hosted models, including Kimi via Day-0 Kimi K2.6 support and DeepSeek via DeepSeek V4 Pro.

You can enable the defense by adding safe_tokenization: true to any Chat Completions API request. The feature is live for all supported models, including Llama, and mirrors Alibaba's Qwen 3.5 integration. The defense maintains identical behavior for benign inputs and is currently an opt-in boolean.

Still wondering? A few quick answers below.

Fireworks AI safe tokenization is a security feature that prevents prompt injection by ensuring user input cannot be interpreted as model control tokens. It separates user text from the structural code that defines system and user turns. This ensures that a model respects the developer's system prompt even if a user tries to forge turn boundaries.

The feature works by pre-processing chat templates to separate control tokens from user content. At request time, it performs a segment-by-segment encoding pass on user text. This breaks any strings that match control tokens into their subword pieces, treating them as literal text rather than structural commands that could override the system prompt.

The feature is live across all supported open-weights models on the Fireworks platform. This includes popular model families such as DeepSeek, Kimi, Qwen, Llama, and GLM. It works for both streaming and non-streaming completions, providing a consistent security layer regardless of which specific open model a developer chooses to deploy.

You can enable the defense by adding a single boolean flag, safe_tokenization: true, to your Chat Completions API request. It is currently an opt-in feature, allowing developers to roll it out per-request or per-endpoint. Because it produces identical token IDs for benign inputs, it can be enabled without causing silent behavior changes for ordinary traffic.

No, the feature uses preservation rather than stripping. User content is never modified, rejected, or silently removed. Any string, including reasoning markers or turn delimiters, can still appear in a user message. The system simply ensures they are treated as plain text by the model instead of being executed as structural control tokens.

Share this update