HeadsUpAI

Taalas Launches Hard-Wired Llama Chip Delivering 10x Faster Inference

· Updated

Taalas, an AI hardware startup building model-specific silicon, launched its first product: a chip with Llama 3.1 8B permanently hard-wired into the hardware. Their HC1 platform achieves 17K tokens/sec per user - nearly 10x faster than current GPU-based inference - while costing 20x less to build and consuming 10x less power.

The performance comes from a fundamentally different architecture. Modern inference hardware separates memory from compute, requiring HBM stacks, advanced packaging, and liquid cooling. Taalas merges both onto a single chip at DRAM-level density, eliminating that bottleneck entirely. Each chip is produced for a specific model, trading generality for extreme efficiency.

The HC1 Llama 3.1 8B is available as a chatbot demo and a beta inference API. Apply for API access at Taalas' site. A mid-sized reasoning LLM on HC1 is expected in spring, with a frontier model on their next-generation HC2 platform planned for winter.

TI
Taalas Inc.
@taalas_inc
X

24 dedicated people. $30M spent on development. Extreme specialization, speed, and power efficiency. Today we launch Taalas’ first product. Check it out: Details: https://t.co/88CA0XAL71 Demo chatbot: https://t.co/ec4ladcKnw API: https://t.co/M3EkaxEqPj

561retweets5.8klikes
View on X

Share this update