There's an old instinct in this field to answer every question about capability with a number of parameters. Bigger model, better results. For a while that was true enough to be a religion. It's no longer the interesting axis.
The constraint that actually shapes what a system can do today isn't how much the model knows in the abstract — it's how much of the right information you can get in front of it at the moment it has to decide. Call it context. And context, increasingly, is the scarce resource we design around, the way we once designed around clock speed.
Knowledge in the weights vs. knowledge at hand
A model's parameters hold a compressed, blurry version of everything it saw in training. That's fantastic for fluency and general reasoning and useless for the one fact that matters right now: your customer's order history, the current state of the codebase, the clause in the contract you signed last Tuesday. None of that is in the weights, and it shouldn't be.
The model is the reasoning engine. The context is the case file. A brilliant lawyer with the wrong file loses.
So the engineering problem quietly became a retrieval problem. Not "how do I make the model smarter," but "how do I assemble exactly the right few thousand tokens — no more, no less — for this specific decision." That's a systems discipline, and most of the leverage now lives there.
Why more context isn't the answer
The naive fix is to stuff everything in. Bigger windows, dump the whole database, let the model sort it out. It doesn't work as well as you'd hope, for two reasons:
- Signal dilution. Relevant tokens buried in irrelevant ones degrade the answer. Attention is finite even when the window isn't.
- Cost and latency. Context isn't free. Every token you include is paid for in money and milliseconds, on every single call.
Which means the skill isn't gathering context — it's curating it. Ranking, filtering, compressing, and knowing what to leave out. The teams that are winning quietly obsess over this. It's unglamorous work that never shows up in a launch tweet.
The practical shift
If you're building, the reframe is simple and a little liberating: stop treating the model as the thing you optimize and start treating the context pipeline as the thing you optimize. The model is a commodity you rent. Your context — how you retrieve, rank, and shape it — is the part that's actually yours.
Compute made the last decade. Context is quietly making this one.