Trip Venturella launches Mr. Chatterbox to test Victorian era AI training

Simon Willison

Mar 31, 2026 · Updated Apr 25, 2026

Trip Venturella released Mr. Chatterbox, a 340-million parameter language model trained exclusively on 28,000 Victorian-era British texts from the 19th century. While the model captures a historical persona, its limited 2.93 billion token dataset highlights the performance gap between public domain training and modern web-scraped models.

Trip Venturella developed Mr. Chatterbox, a 340-million parameter language model trained from scratch using Andrej Karpathy's nanochat architecture. The training corpus consists of 2.93 billion tokens from out-of-copyright British Library texts published between 1837 and 1899, ensuring no modern data influenced the model's vocabulary or logic.

This project serves as a benchmark for ethically trained models using only public domain data. Testing shows the model behaves more like a Markov chain than a modern assistant, reinforcing Chinchilla scaling laws which suggest a 340M parameter model requires 7 billion tokens to achieve conversational utility.

You can run the 2.05GB model locally using the llm-mrchatterbox plugin for the llm CLI tool. The command uvx --with llm-mrchatterbox llm chat -m mrchatterbox initiates a session. Developer Simon Willison used Claude Code to autonomously build the plugin and wrap the model for local execution.

View the full update on simonwillison.net

Simon Willison

@simonwMar 30

Mr. Chatterbox is a new 2GB nanochat model trained from scratch by Trip Venturella on "28,000 Victorian-era British texts published between 1837 and 1899" - I released an llm-mrchatterbox plugin which can run it locally on my Mac https://t.co/EIu15Wszev

387

View on X

Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →

Keep reading

Karpathy Nanochat Miniseries Shows How to Train Compute-Optimal LLMs for Under $100

Andrej Karpathy released the nanochat miniseries demonstrating compute-optimal LLM training that reproduces Chinchilla scaling laws. The experiments cost ~$100 on an 8xH100 node and show how to think of LLMs as a family controlled by a single compute budget dial, not individual fixed models.

Mistral AIMar 28

Mistral AI Launches Voxtral TTS to Challenge Proprietary Models with Open Weights

Mistral AI launched Voxtral TTS, a 4B-parameter text-to-speech model capable of zero-shot voice cloning from just three seconds of audio. By offering frontier-grade emotional expressiveness and low latency in an open-weight format, it provides a high-performance alternative to closed-source providers for building real-time voice agents.

OpenClaw Adds ChatGPT History Imports and Agentic Parity Benchmarking

OpenClawApr 14

OpenClaw Adds ChatGPT History Imports and Agentic Parity Benchmarking

OpenClaw v2026.4.11 introduces a memory ingestion feature that lets users import their ChatGPT history into a self-hosted Memory Palace for local inspection. The update also adds a benchmarking gate to compare agentic performance between frontier models like GPT-5.4 and Opus 4.6.

MiniMax brings M3 to local PCs with 1M context open weights

MiniMaxJun 4

MiniMax brings M3 to local PCs with 1M context open weights

MiniMax announced that its M3 model is joining the NVIDIA and Microsoft local LLM lineup, with weights releasing to the community within 10 days. The move brings high-capacity multimodal reasoning and coding capabilities directly to local hardware.