In the fall of 2024, a teardown analysis itemizing the component costs of a single Nvidia H100 server quietly made the rounds in semiconductor-industry circles. Its finding: HBM (high-bandwidth memory) accounted for a larger share of the cost than the GPU die itself. Then earlier this year, a similar structural analysis was confirmed again, this time with concrete figures. Roughly 66 percent of the cost of an AI chip — two-thirds of it — comes not from compute circuitry but from memory.

On the same day, a post scoring 121 points appeared on Hacker News, the developer community site. Its title was "Don't use Claude as an architect." It was a firsthand account of handing system architecture design to a large language model, then losing days untangling structurally snarled code — and the comments filled up with similar stories.

It might look like pure coincidence that a hardware supply-chain story and a developer-workflow anecdote drew attention on the same day. But read side by side, the two signals trace the outline of a judgment framework that solo founders and small teams need right now as they pull AI tools into their daily work.

What It Means That Memory Is 66 Percent of an AI Chip's Cost

Unlike ordinary DRAM, HBM is manufactured by stacking multiple layers of memory chips vertically and placing them alongside the GPU die on a silicon interposer. Data moves more than 10 times faster than in conventional memory, but the process difficulty and yield-management burden are correspondingly higher. Only three companies can mass-produce HBM today — SK Hynix, Samsung Electronics, and Micron — and for high-end products at HBM3E and above, SK Hynix holds more than 50 percent of the market.

This structure explains why strike risk at Samsung drew so much attention. In 2024, when the union at Samsung's semiconductor division staged the longest strike in the company's history, concerns surfaced within the industry that HBM yield-improvement timelines could slip. This wasn't merely one company's labor dispute — it connected directly to a real bottleneck in the global AI infrastructure supply chain, because HBM supply volume is a direct variable in how Nvidia schedules its H100 and H200 shipments.

Looked at more concretely, the figure means that competition over AI model performance isn't decided by algorithm design alone. Since the transformer architecture was published, model architectures themselves circulate as open source. GPT, Claude, and Gemini are all transformer-family models; the differentiators are training-data quality, fine-tuning methods, and inference speed. Inference speed ultimately comes down to chip performance, and chip performance is tied to HBM supply. No matter how sophisticated your prompt engineering at the software layer, if the silicon supply chain underneath wobbles, service quality changes with it.

For solo founders and small teams, the moment this becomes tangible is usually when API prices rise or response times slow down. In early 2023, when ChatGPT introduced a paid plan and throttled response speeds for free users, many practitioners felt the cost structure of AI infrastructure for the first time. Until HBM supply constraints ease, AI API prices are likely to fall more slowly than people hope.

Why a Warning Not to Use It as an Architect Earned 121 Points

On Hacker News, 121 points is no small number. When a post built on a contentious claim or strong disagreement earns that kind of score, it signals that a lot of developers have lived through the same experience.

The post's argument runs like this. Ask Claude or GPT-4 to "design the entire backend architecture for this service," and you get a plausible-looking diagram and code structure. The trouble starts there. That architecture often reflects none of the actual deployment environment's constraints, the team's maintenance capacity, or the dependencies in the existing codebase. If it gets adopted without anyone quite deciding to adopt it, the time spent later debugging and refactoring runs longer than designing the thing yourself from the start.

In the comments, one developer wrote that "an LLM is closer to a code-autocomplete tool than an architecture decision-maker." Another pointed out that "the problem is the model doesn't know its answer is wrong even when it is." Those two lines compress the heart of the debate.

A chatbot "personality hacking" case reported the same day reads in this context as well. With a cleverly constructed system prompt, a chatbot could be made to respond outside its assigned role boundaries. In other words, developer communities and everyday users alike are now at the stage of probing — through direct experimentation — exactly how much authority to grant AI agents.

Anthropic's decision to release 'knowledge-work-plugins' as public open source at this moment reads as a response to that debate. The design direction: constrain AI not as an entity controlling the whole system, but as plugins that assist with specific tasks. It aligns with the community's hard-won conclusion that "Claude should be a plugin, not an architect."

The argument over how wide a role to allow AI agents is, in the end, a question of setting trust boundaries. It echoes a view emphasized in practical business books: when you grant authority to a new tool or partner without defining a clear scope of responsibility in the first 90 days or so, the cleanup costs later grow out of all proportion. AI agents are no different. If you don't concretely limit their role early on, plausible-but-off-context outputs pile up, and unwinding them takes longer than doing the work yourself would have.

Three Concrete Checks Solo Founders Should Run Right Now

Don't be optimistic about how far AI API prices will fall. HBM supply constraints won't resolve quickly. Samsung's HBM3E yield problems persisted throughout 2024, and SK Hynix's new fab ramp-ups for expanding supply stretch across 2025 and 2026. If you've built workflows that treat AI APIs as effectively unlimited, this is the moment to recheck the cost structure. Token-heavy work in particular — image generation, processing large documents — is likely to be sensitive to future price swings.

Consciously mark the moments you hand 'design' to AI. For clearly bounded tasks — drafting copy, summarizing research, automating repetitive documents — AI delivers a real speed boost. But in areas that demand context and judgment — strategic planning, service architecture, deciding the direction of customer communication — adopting AI output wholesale risks following an answer that's plausible but doesn't fit your situation. "Don't use Claude as an architect" isn't a warning for developers alone. Business strategy, content structure, and pricing design all sit inside the same boundary.

Write down, once, what role your AI tools play. Specify which model handles which task and who makes the final call on the output. For a solo founder this is a promise to yourself — but without it, you drift into the habit of adopting AI output unconsciously. A working rule like "GPT-4o generates first drafts; I personally review every client proposal before it goes out" heads off the situations where role boundaries blur.

Most of the discomfort solo founders feel using AI tools today comes from exactly this point. The tool looks impressive, so at first you delegate a lot. Then at some point you find yourself asking, "Is this actually the direction I wanted?" The later that question arrives, the bigger the cleanup bill.

The HBM supply chain is setting the ceiling on AI services' quality and pricing, while developer communities experiment their way toward tighter boundaries on what AI agents should be allowed to do. Where these two currents cross, the decision about where — and how deeply — to bring AI into your work isn't a question of tool selection. It's a question of how you run your business. Follow the designs the tool proposes long enough, and you may find your business quietly moving in a direction shaped by the tool, without ever having decided to go there.