AI Products Are Different: What I've Learned Building ML Products

Every PM who’s built AI products knows this moment: You run a perfectly good discovery process, build exactly what customers said they wanted, ship it with decent model accuracy—and customers hate it.

Not because the product is bad. Because AI products don’t work like traditional products. And most PMs don’t realize this until it’s too late.

I’ve been building AI products for the past two years at HatchClaw. Before that, I spent 18 years building traditional software. Here’s what I wish someone had told me on day one.

Customers Don’t Know What “Good” Looks Like

In traditional product discovery, you can show customers prototypes, mockups, even competitors’ products. They can give you useful feedback because they understand the problem space. They’ve seen solutions before.

With AI products? They have no frame of reference.

When I first started building our AI-powered product intelligence tool, I’d show customers designs and ask: “Would this be useful?” They’d say yes. Then we’d ship it and they’d be disappointed. Not because we built it wrong—because they couldn’t envision what the AI would actually do until they experienced it.

The breakthrough came when I stopped asking “what do you want” and started asking “what decisions are you trying to make?” Then I’d build small, working demos that actually showed the AI making those decisions. Not mockups. Not prototypes. Working AI making real decisions on their data.

The feedback completely changed. Because now they could see what AI-powered decision support actually felt like.

The Feedback Loop Is Messier

With traditional products, you ship, measure usage, collect feedback, iterate. Clean loops. Clear signals.

AI products don’t work this way. The model performs differently for different users. The quality varies based on data quality. What works for one customer might fail for another—not because the product is broken, but because AI is probabilistic, not deterministic.

Here’s what this looked like for us: We’d ship an improvement that made our model 15% more accurate on our test set. Sounds great, right? But three customers would email saying it got worse. Were they wrong? No. For their specific data patterns, the model actually did perform worse.

Traditional product thinking says: “We improved the core metric, customers will be happy.” AI product thinking says: “Different customers experience different quality levels, and we need to understand why.”

This completely changes how you prioritize. In traditional products, you optimize for the average user. In AI products, you need to understand the distribution of quality across user segments and decide which segments matter most.

Model Quality Is a Product Problem

The biggest trap: Treating model quality as an engineering problem.

Most teams organize like this: PMs define requirements, engineers build features, ML engineers improve models. Clean separation of concerns. This is exactly backwards for AI products.

Model quality is a product decision. Not an engineering decision.

When our model accuracy was 75%, I had to decide: Do we ship and learn from real data? Or wait until we hit 85%? That’s not an ML question. That’s a product question. What decisions can customers make at 75% confidence? What decisions require 85%?

I’ve seen ML teams spend months trying to improve accuracy from 82% to 85% because “that’s the goal.” Meanwhile, customers would have been happy with 80% if the product helped them make better decisions. The PM failed by not framing model quality as a product trade-off.

Here’s what works: Treat model quality like any other feature trade-off. What value does each percentage point of accuracy unlock? What user jobs become possible? What decisions can customers trust?

Discovery Requires Different Methods

Traditional discovery: Customer interviews, prototype testing, feedback analysis. You’re trying to understand what customers want.

AI discovery: You’re trying to understand what the AI can reliably do, what customers will trust, and where the model breaks down. Completely different problem.

The methods that work:

Shadow mode first. Run the AI alongside humans doing the job. See where it helps, where it misleads, where it’s ignored. Don’t ask customers if they’d trust an AI recommendation—watch what they actually do when they have one.

Show degradation gracefully. Your model will perform worse on edge cases. How does the product behave when quality drops? This isn’t an error handling question—it’s a core product experience question.

Define confidence thresholds. At what confidence level do you show predictions? At what level do you hide them? These aren’t engineering constants—they’re product decisions that change the entire experience.

You Can’t Roadmap AI Development

This is the hardest lesson for PMs coming from traditional products.

You can’t say: “In Q3, we’ll improve accuracy to 90%.” Model development doesn’t work that way. You might get there in two weeks. You might never get there. It depends on data, on architectural choices, on whether the problem is even solvable at that level.

The shift: Stop roadmapping model improvements. Roadmap capability unlocks.

Instead of “improve accuracy to 85%,” say “enable users to trust AI recommendations for decision X.” Then work backwards: What confidence threshold do they need? What does the UX look like at that threshold? Can we reach it? If not, what’s the alternative?

This drives completely different conversations with ML engineers. Not “make the model better” but “what capabilities can we unlock with the current model, and what’s the next capability worth investing in?”

The Products That Work

After two years of this, I’ve noticed a pattern in AI products that work versus those that don’t:

Products that work: Start with a narrow decision, build AI that reliably supports that decision, expand to adjacent decisions.

Products that don’t: Try to build broad AI capabilities, ship impressive demos, struggle to find product-market fit.

The difference is ruthless scoping. AI is powerful but messy. The products that succeed embrace the mess by constraining the problem space until the AI can perform reliably.

At HatchClaw, we don’t try to “analyze all customer feedback.” We help PMs identify patterns in feedback within specific contexts. Much narrower. Much more useful.

What I’d Tell My Past Self

If I could go back two years and give myself advice:

Stop treating AI like a feature. It’s a fundamentally different product experience.
Model quality is a product decision, not an engineering metric.
Discovery requires showing working AI, not gathering requirements.
Scope ruthlessly. Nail one decision before expanding.
The feedback loop is messier—build for that reality.

Most PMs will learn this the hard way. I did. But if you’re building AI products now, maybe you can skip some of the pain.

The opportunity is real. AI genuinely changes what products can do. But you can’t build AI products with traditional PM methods. The sooner you accept that, the better your products will be.