Atome lm: Tiny AI Model Runs on Microcontrollers

Atome lm, a 944,000-parameter ternary language model, compiles to a 2.6 KB inference engine and runs on $2-$30 microcontrollers.

Atome lm, hosted at atomelm.com, is a 944,000-parameter ternary language model that compiles to a 2.6 KB inference engine and a 271 KB model blob. It runs on microcontrollers priced between $2 and $30 — the same class of chip already embedded in thermostats, hearing aids, car keys, industrial sensors, agricultural monitors, and children's toys — with no heap allocations, no network calls, and no operating system requirement.

The global AI industry has spent the past three years scaling models upward — billions of parameters, terabytes of training data, datacenters consuming entire power grids. Atome lm goes the opposite direction. Its architecture uses ternary weights ({-1, 0, +1}) to achieve a 20× disk advantage over standard floating-point models of equivalent scale, and a three-pathway routing mechanism — local convolution, diagonal SSM, and sparse top-k attention — managed by a per-token softmax router.

Critically, Atome lm achieves bit-exact parity across three environments: Python on a laptop, C99 on a server, and a Cortex-M3 emulated under QEMU. The maximum numerical delta is 3.7×10⁻⁷ — not 'close,' but provably identical to single-precision floating-point computation. This makes the model auditable for certified or safety-critical products. All 146 tests pass from a cold repository checkout.

The model is the firmware. It runs without the internet, gives the same answer on every device, and ships as part of the firmware — not as a cloud service.

Share: