AI and the Illusion of Productivity

As companies rush to adopt AI and encourage employees to “use more of it,” there’s a critical missing piece:

Nobody actually knows how to measure productivity anymore.

To be fair, this has always been somewhat true. But historically, we relied on rough proxy metrics:

  • Pull requests
  • Lines of code
  • Code reviews
  • Commits

These weren’t perfect, but they worked because writing code was expensive. Engineers had to think carefully before committing code. The friction forced deliberation about whether something was actually worth building.

AI has fundamentally changed that equation.

Writing code is now cheap. Extremely cheap.

As a result, everyone (and their grandmother) can generate large volumes of code. The barrier to building something is no longer effort — it’s judgment.

And that’s where the real productivity problem begins.

Engineering leaders now need to understand which code actually drives business impact — and which doesn’t. More importantly, incentives need to be aligned so that both humans and AI focus on writing high-impact code.

One useful way to think about this is by categorizing the types of code being written:

1. Prototypes

There are good prototypes and bad prototypes.

Good prototypes are experiments. An engineer is testing a hypothesis and doing pathfinding that informs technical or product strategy. Even personal learning can be valuable if it generates new insight.

Bad prototypes are just token wastage.

Think: cloning some product with a one-line prompt, building something flashy, and then throwing it away without learning anything meaningful.

2. New Products / Features

This is the work that eventually reaches customers.

But code isn’t the main bottleneck here. Other constraints dominate:

  • Users have limited attention — you can’t spam features
  • The feature must be financially viable
  • It must integrate coherently with the rest of the product ecosystem

Feature velocity without product discipline creates noise, not value.

3. High-Value Fixes

These are small, targeted changes that move important business metrics:

  • Revenue
  • Cost efficiency
  • User engagement
  • Reliability

They often require a deep understanding of how multiple systems interact. Identifying these opportunities usually involves significant analysis and experience.

Ironically, these changes may involve very little code but very high impact.

4. Low-Value Fixes

Every engineering team accumulates long-tail issues: minor bugs, small annoyances, edge cases.

There can easily be dozens or hundreds of these. Many never get fixed — and often for good reason.

5. Refactors / Migrations

This is the cost of operating software systems.

Dependencies evolve, infrastructure changes, and migrations are necessary to keep the system healthy. Sometimes straightforward. Sometimes painful.

But unavoidable.


Here’s the problem:

A huge portion of AI-generated code today falls into Category 1 (prototypes) and Category 4 (low-value fixes).

If organizations aren’t careful, they will optimize for code volume instead of business impact.

What we actually need is the opposite:

  • More Category 2 (new products/features)
  • More Category 3 (high-value fixes)
  • Better Category 1 prototypes that produce meaningful pathfinding

And far less noise.

As token usage (and AI infrastructure costs) continue to rise, observability into where those tokens translate into business impact becomes essential.

Without that visibility, organizations will continue celebrating their “AI-native transformation” while quietly wondering:

Why are our AI costs exploding… but the business isn’t moving?