24 dedicated people. $30M spent on development. Extreme specialization, speed, and power efficiency. Today we launch Taalas’ first product. Check it out: Details: https://t.co/88CA0XAL71 Demo chatbot: https://t.co/ec4ladcKnw API: https://t.co/M3EkaxEqPj
Taalas Launches Hard-Wired Llama Chip Delivering 10x Faster Inference
· Updated
Taalas, an AI hardware startup building model-specific silicon, launched its first product: a chip with
Llama 3.1 8B permanently hard-wired into the hardware. Their HC1 platform achieves 17K tokens/sec per user - nearly 10x faster than current GPU-based inference - while costing 20x less to build and consuming 10x less power.The performance comes from a fundamentally different architecture. Modern inference hardware separates memory from compute, requiring HBM stacks, advanced packaging, and liquid cooling. Taalas merges both onto a single chip at DRAM-level density, eliminating that bottleneck entirely. Each chip is produced for a specific model, trading generality for extreme efficiency.
The HC1 Llama 3.1 8B is available as a chatbot demo and a beta inference API. Apply for API access at Taalas' site. A mid-sized reasoning LLM on HC1 is expected in spring, with a frontier model on their next-generation HC2 platform planned for winter.
TI
Taalas Inc.
@taalas_inc
561retweets5.8klikes
View on X




