24 dedicated people. $30M spent on development. Extreme specialization, speed, and power efficiency. Today we launch Taalas’ first product. Check it out: Details: https://t.co/88CA0XAL71 Demo chatbot: https://t.co/ec4ladcKnw API: https://t.co/M3EkaxEqPj
Taalas Launches Hard-Wired Llama Chip Delivering 10x Faster Inference
· Updated
Taalas, an AI hardware startup, launched a chip with Llama 3.1 8B permanently hard-wired into silicon - 17K tokens/sec per user, nearly 10x faster than H200 GPUs at 20x lower build cost. The inference API is open for developer access.
Llama 3.1 8B permanently hard-wired into the hardware. Their HC1 platform achieves 17K tokens/sec per user - nearly 10x faster than current GPU-based inference - while costing 20x less to build and consuming 10x less power.The performance comes from a fundamentally different architecture. Modern inference hardware separates memory from compute, requiring HBM stacks, advanced packaging, and liquid cooling. Taalas merges both onto a single chip at DRAM-level density, eliminating that bottleneck entirely. Each chip is produced for a specific model, trading generality for extreme efficiency.
The HC1 Llama 3.1 8B is available as a chatbot demo and a beta inference API. Apply for API access at Taalas' site. A mid-sized reasoning LLM on HC1 is expected in spring, with a frontier model on their next-generation HC2 platform planned for winter.
Every HeadsUpAI update is written based on its original source and reviewed before it's published. Read our editorial standards →






