Every headline is about the frontier — the biggest, most capable model anyone has trained. That's the race everyone can see. The race that will actually change your daily life is happening in the other direction: models getting small enough to live on the device in your hand.
A model that runs locally is a different kind of thing than one that lives in a datacenter. Not just cheaper or faster — different in what it makes possible, and different in what it makes safe.
Three things change when the model comes home
Privacy stops being a promise and starts being a property. When inference happens on your phone, your data never leaves it. That's not a policy you have to trust — it's a fact about where the computation ran. For anything sensitive — health, messages, money — that distinction is everything.
Latency collapses. No round trip to a server means the model can sit inside the interaction loop — reacting as you type, as you speak, as you move — instead of after it. Some experiences only work when the response is instant. Those become buildable.
The economics invert. Cloud inference costs the provider money on every single call. On-device inference costs them nothing after the download. That changes which products can exist, especially the free, always-on, runs-a-thousand-times-a-day kind.
A capability that's cheap enough becomes a different capability. Ubiquity is a feature, not a discount.
The catch, and why it's shrinking
Small models are, of course, less capable in the raw. But "capable at what" is the whole question. Most real tasks aren't frontier tasks. Summarize this. Classify that. Extract the three fields I care about. Draft the reply I'll edit anyway. For the long tail of ordinary work, a well-tuned small model is already good enough — and getting better faster than the tasks are getting harder.
The pattern I expect to win is a hybrid: a small local model handling the common case instantly and privately, quietly escalating to a big one only when it hits something genuinely hard. Most of the time, you'll never leave the device. You won't even notice.
The consequence nobody's pricing in
When intelligence is free, local, and instant, it stops being a destination you visit and becomes a texture of everything you use. Not an app you open — a property of the tools you already have. That's the big consequence hiding inside the small model. The eagle, as ever, notices the quiet thing first.