HeadsUpAI

Trip Venturella launches Mr. Chatterbox to test Victorian era AI training

· Updated

Trip Venturella developed Mr. Chatterbox, a 340-million parameter language model trained from scratch using Andrej Karpathy's nanochat architecture. The training corpus consists of 2.93 billion tokens from out-of-copyright British Library texts published between 1837 and 1899, ensuring no modern data influenced the model's vocabulary or logic.

This project serves as a benchmark for ethically trained models using only public domain data. Testing shows the model behaves more like a Markov chain than a modern assistant, reinforcing Chinchilla scaling laws which suggest a 340M parameter model requires 7 billion tokens to achieve conversational utility.

You can run the 2.05GB model locally using the llm-mrchatterbox plugin for the llm CLI tool. The command uvx --with llm-mrchatterbox llm chat -m mrchatterbox initiates a session. Developer Simon Willison used Claude Code to autonomously build the plugin and wrap the model for local execution.

Simon Willison
Simon Willison
@simonw
X

Mr. Chatterbox is a new 2GB nanochat model trained from scratch by Trip Venturella on "28,000 Victorian-era British texts published between 1837 and 1899" - I released an llm-mrchatterbox plugin which can run it locally on my Mac https://t.co/EIu15Wszev

3retweets87likes
View on X

Share this update