The Looks vs. Reality Gap

We built Ladder to answer a question most product teams avoid: does your product actually feel as good as it looks?

Ladder Pulse is our enterprise VOC product — it ingests the voice of your customers (support tickets, NPS responses, interview transcripts, in-app feedback, survey verbatim) and runs it through the Ladder framework to produce real Ladder scores for the lived experience. For the Ladder Top 100 study below, we used a lighter-weight public-signal analysis, not Pulse itself, since these products aren't our customers.

We pointed Ladder at 36 of the most-used digital products in the world. We scored each one twice. Once on the interface: the screens, the visual design, the information architecture. And once against aggregated public sentiment — what thousands of real users actually say about the lived experience.

The gap between those two numbers is the most honest thing we've ever measured.

What Ladder found: the biggest gaps

Airbnb: Screen 3.7, Sentiment 2.3 (gap: -1.4)

Ladder ingested 42,000+ data points across 3,000+ online sources. The signal was overwhelming: hidden fees at checkout, a host experience that feels like a different product, and a review system that's increasingly gamed. The screen says "discovery." The data says "surprise costs." See how Airbnb's full score breaks down on the Top 100.

Notion: Screen 4.1, Sentiment 2.8 (gap: -1.3)

Ladder analyzed 4,100+ data points across 3,000+ online sources. The pattern was clear: performance degrades with large workspaces, the blank-page syndrome is real, and search (the most critical feature in a knowledge tool) is not fast enough. The product that can do anything struggles to do the one thing users need most. See Notion's full Top 100 breakdown.

Arc Browser: Screen 3.8, Sentiment 2.6 (gap: -1.2)

Ladder processed 1,800+ data points and surfaced a consistent theme: innovative sidebar navigation is undermined by stability issues, high memory consumption, and a learning curve that punishes people who just want to browse the web. Innovation without reliability is a demo, not a product.

Apple Music: Screen 3.6, Sentiment 2.4 (gap: -1.2)

Ladder analyzed 8,500+ data points across 3,000+ online sources, where app store ratings average 2.4 out of 5. Users consistently describe library management issues, inconsistent navigation, and discovery algorithms that trail Spotify by miles. Beautiful surfaces don't compensate for friction in daily tasks.

What Ladder found: products where reality exceeds appearance

Not every gap goes the wrong direction. Ladder identified products where the lived experience actually outperforms the interface.

Raycast: Screen 3.8, Sentiment 4.2 (gap: +0.4)

Ladder analyzed 1,500+ data points and found something remarkable: users rate this product higher than its interface suggests. The UI is minimal: a command bar and extensions. It doesn't photograph well. But Ladder surfaced language usually reserved for products people can't live without: "replaced five apps," "fastest tool I've ever used." When speed is the design, screenshots undersell it. Explore Raycast's score on the Top 100.

Linear: Screen 4.3, Sentiment 4.2 (gap: -0.1)

Across 3,200+ data points, Ladder found a gap of just 0.1. That near-zero delta means the product is honest. The interface promises speed; the experience delivers speed. There's no marketing layer hiding a different reality underneath. See why Linear leads the entire Top 100.

Superhuman: Screen 3.8, Sentiment 3.8 (gap: 0.0)

Ladder identified zero gap. The email client that charges $30/month looks like a $30/month email client and feels like one too. No bait-and-switch.

What the gap actually measures

The Screen-to-Sentiment delta isn't a quality score. It's a trust score. Products with small gaps are honest. Products with large gaps are making promises their experience can't keep.

This is exactly what Ladder was designed to reveal. Traditional metrics like NPS or CSAT give you a single sentiment number. Ladder maps customer voice data against the five levels of the Ladder framework and tells you not just how people feel, but why, and what to fix first.

A beautiful interface with a low Ladder Top 100 score is a warning: this team invests in appearance over substance. An unremarkable interface with a high Ladder Top 100 score is an opportunity: this team builds things that work, and hasn't yet invested in making them look as good as they feel.

The gap is where the real story lives. Explore all 36 products and their Ladder Top 100 scores on the Ladder Top 100.

How Ladder measures

Data ingestion: Ladder scans 3,000+ online sources, aggregating thousands of data points per product. Ladder Pulse customers feed their own internal signals — support tickets, NPS responses, field reports, employee surveys, customer interviews, CSAT data, in-app feedback, churn interviews — to get the lived-experience score for their own product.

Ladder mapping: Our AI, trained on 20 years of experience evaluation at Drawbackwards, analyzes sentiment, identifies friction patterns, and maps the quality of the described experience to Ladder's 1.0-5.0 scale. Not just positive or negative. Five distinct levels of experience quality.

Continuous scoring: Ladder Top 100 scores update as new signal flows in. Both scores in this study will be updated monthly. The Top 100 is a living dataset.

Want to see what Ladder reveals about your product or organization? Request a Pulse demo and see your real score from the people who use what you build.