For two years, the defining verb of artificial intelligence was answer. You asked; the model responded. It was a spectacular parlor trick that quietly rewired half the knowledge economy — and it was also, I'd argue, the least interesting thing these systems will ever do.

The shift already underway is from models that answer to models that act. Agents. Systems that don't just tell you how to book the flight, but book it — checking your calendar, comparing fares, holding the seat, and flagging the one detail you'd have missed. The gap between those two capabilities looks small in a demo. In practice, it's the difference between a very smart encyclopedia and a very junior colleague.

Why "answering" was always a waypoint

A question-answering system has a comforting property: it's bounded. The output is text, the human stays in the loop, and the worst-case failure is a confidently wrong paragraph. That safety is exactly why it went mainstream first. But it also caps the value. Knowledge you have to act on yourself is knowledge with a tax attached.

Agents remove that tax — and inherit a much harder problem. The moment a system can take actions in the world, three things stop being optional:

  • Reliability. A wrong answer is annoying. A wrong action is a refund, a deleted file, or a very confused customer.
  • Memory. An assistant that forgets what it did five minutes ago can't be trusted with anything that spans more than one step.
  • Judgment about when to stop. Knowing you don't know — and escalating — turns out to be most of the job.
The hard part of agents was never getting them to act. It's getting them to know which actions they shouldn't take.

The architecture is quietly inverting

In the answering era, the model was the product. You wrapped a thin interface around it and shipped. In the agent era, the model is one component in a system that also includes tools, memory, a planner, and — crucially — a set of guardrails that decide what the model is allowed to touch.

This is why I've stopped being impressed by benchmark scores in isolation. A model that's two points better on some eval but can't reliably call a tool twice in a row is worse, for anything real, than a slightly duller model wired into a system that catches its mistakes. The intelligence moved out of the weights and into the loop.

What this means if you build software

Three practical bets I'd make for the next few years:

  • The winning products will be the ones with the best feedback loops, not the biggest models. Whoever can watch an agent fail, learn from it, and close the gap fastest wins.
  • "Human in the loop" will get more precise. Not a checkbox on everything, but a scalpel — inserted exactly at the steps where the cost of a mistake is high and the model's confidence is low.
  • Evaluation becomes the moat. Anyone can call an API. Almost no one can tell you, rigorously, whether their agent got better this week.

The eagle's view

I keep coming back to the image this whole site is named for. An eagle doesn't succeed by flapping harder than the other birds. It succeeds by seeing the whole valley, waiting for the right moment, and committing without hesitation when it comes. That's a decent description of a good agent, too — and a decent description of how to build one.

We're at the very start of this. The answering era felt like the whole story because it was the first chapter we could read. The agent era is where the plot actually begins.