A $40 Million Bet on Data AI Can't Copy

In late May 2026, the American healthcare company CVS invested $40 million in a data startup called H1. The news drew attention inside and outside the industry because it landed at a moment when SaaS startup funding had visibly cooled in the wake of the AI boom. But more striking than the size of the check was a single remark from CEO Ariel Katz as he explained the deal.

"AI can replicate workflow SaaS.
It cannot replicate H1's proprietary physician data."

That one sentence compresses a transition now playing out across the entire software industry. As AI tools spread rapidly, the line between what keeps its value and what doesn't is coming into sharp focus. And CVS just spent $40 million confirming that the line sits in a surprisingly simple place.

What AI Catches Up With Fast — and What It Doesn't

H1 is a data company that aggregates information on American medical professionals and provides it to pharmaceutical, biotech, and medical device companies. It delivers refined profiles of individual physicians: their specialties, prescribing patterns, academic publication records, conference presentation histories, and influence structures within hospitals. The company offers sales automation and marketing features too, but the asset it actually leads with isn't those features — it's the data itself.

There's a reason this distinction has become important lately. As general-purpose AI models like GPT-4o and Claude have settled in, the prediction that AI will soon replace most workflow software has been inching toward reality. Tasks like scheduling, drafting emails, summarizing contracts, and organizing meeting notes are already being handled in large part by AI. The so-called tool layer is commoditizing fast, and the time it takes to build any single workflow feature has shrunk beyond comparison with what it used to be.

H1's physician database, by contrast, sits on a different plane. Information like prescribing histories of physicians across the United States, academic networks, and decision-making roles inside hospitals can only be assembled by collecting individual records over years and verifying their consistency. It's hard to gather overnight, legal constraints apply, and the shorter the accumulation period, the lower the trust — and the lower the adoption rate in the field. When CVS invested $40 million, it was betting on that accumulation process itself.

Why the Moat Is Moving From Features to Data

Before AI, software competitiveness was typically explained in terms of feature advantages: a more intuitive UI, more integrations, faster processing. That logic started to wobble once connecting a general-purpose AI through a single API made it possible to match a large share of a product's functionality within weeks. Whether you're a startup or a solo developer, open-source models have compressed feature work that once took years into a matter of weeks.

In business strategy, the key yardstick for explaining sustainable competitive advantage is the cost of imitation: how much time and how many resources would it actually take a competitor to copy your strength? The higher that cost, the thicker your defenses; the lower it is, the faster you get eroded. For H1, the thickness of the moat is the accumulation period of the physician data itself. A feature advantage can be matched within six months. Three years of clinical and prescribing data takes three years.

The same structure shows up in Korea. One reason a real estate information platform came to be trusted more than the giant web portals for certain uses wasn't a feature difference — it was the density of actual transaction data that users had entered and verified themselves over years. A fashion e-commerce player held its own against the major platforms because fashion-specific reviews and sizing data had accumulated through community activity over a long stretch. Even when the features look similar, different data density feels different to the user. I'd argue this isn't simply a story about a technology trend — it's a question of how a business is fundamentally designed to operate.

Still, "Have Data, Will Survive" Is an Overreach

This is where the counterarguments deserve an honest look. H1's logic is persuasive, but stretching it into a general formula — own proprietary data and you'll keep your edge in the AI era — runs into several caveats.

First, data without the capability to use it simply sits idle. Some large Korean corporations hold years of customer data but, in more cases than you'd expect, have failed to turn it into new services or better decisions. Accumulating data and refining it into something genuinely valuable are entirely different kinds of competence.

Second, the data-moat strategy isn't an immediately available path for a solo founder just starting out. A small team can't replicate in short order the physician database H1 built over many years. This strategy is a valid defensive logic for someone who has already been accumulating data — it may not be an offensive strategy a new entrant can apply right away.

Third, as privacy regulation tightens, the cost and legal risk of building and maintaining proprietary data climb along with it. There's no guarantee that a model H1 built inside the particular regulatory structure of the US healthcare market will work the same way in other countries or other industries.

Even so, the question is hard to push aside entirely. With AI rapidly eating the tool layer, if the value you provide lives only in features and workflows, that ground keeps shrinking.

The Question Left for the One-Person Business

When you transpose the H1 case into the context of Korean solo entrepreneurs and small-scale operators, a few categories are worth auditing.

Relationship data. The decision-making style of a specific contact at a client company, the report format they prefer, who actually holds influence within an industry. If this lives only in your head, or sits fragmented across note-taking apps, it can't be reused on the next project. Recorded systematically, it becomes a starting point that competitors in the same industry don't have.

Domain-specific case collections. Patterns you've observed repeatedly within a particular industry, region, or customer segment. Information like "the profit-and-loss structure that independent cafés of roughly 1,000 square feet share during their first three months" or "in B2B sales, companies with more than three layers of sign-off take an average of six weeks longer on initial decisions" isn't in any general-purpose AI's training data. As this accumulates, the nature of your evidence changes — whether you're selling consulting or content.

Customer feedback, verbatim. Not the average score from a survey, but the actual sentences customers wrote. When statements like "I got confused at this point, for this reason" pile up, you have a different kind of basis for deciding how to improve the product and where to point the marketing. Averages make you lose direction; verbatims help you find it.

The premise underneath all of this is recording. No matter how good your relationships and experience are, if they aren't written down, they can't be reused the moment your team grows or you connect an AI tool. That's why the idea of redesigning business operations as a data-accumulation process carries real weight in this moment.

The logic H1's CEO laid out in explaining the CVS investment is simple: the moat isn't software features — it's accumulated data. In the business you run today, where is the position a competitor couldn't quickly reach even with time and effort? And are you investing in records and data as much as you invest in features and tools?