Two things are happening in the AI market right now, and they point in such different directions that it's hard to believe they belong to the same story.

On one side, Nvidia has poured $40 billion into AI equity investments this year alone — roughly 60 trillion Korean won. The company that sells GPUs is buying ownership stakes in the companies that buy its GPUs. On the other side, a 35B-parameter (35 billion) model now runs at 80 tokens per second on an ordinary gaming graphics card with 12GB of VRAM. A workload that required data-center-grade hardware a year ago now happens in real time on a gamer's desktop PC.

Big Tech is locking down the top of the market with capital, while open source is tunneling in from the bottom with efficiency. Caught in between is everyone else — including us.

Nvidia's Transformation: From GPU Maker to AI Capitalist

Forty billion dollars is more than double the South Korean government's entire annual R&D budget. What Nvidia did with that money is buy direct equity stakes in AI startups and infrastructure companies.

Why does this matter? Until now, Nvidia's business model was simple: build GPUs, sell GPUs. That was it. Now Nvidia owns pieces of the companies that buy its GPUs — and funds them so they can buy even more.

This isn't ordinary investing. It's a closed loop in which Nvidia's money circles back to Nvidia. 

Nvidia invests
Startup grows
Buys GPUs
Revenue rises
Investment expands

Nvidia's self-reinforcing investment loop

The structure produces two effects. In the short term, Nvidia's revenue and market value grow even faster. In the long term, the entire AI infrastructure ecosystem becomes locked into Nvidia's standards. Even if AMD or Intel try to catch up later, they'll find it hard to break into an ecosystem already wired together with Nvidia's capital.

The key point is that a company that makes GPUs has repositioned itself as the capitalist of the AI ecosystem — a strategy of sealing off the market's infrastructure layer with sheer financial power.

The Open-Source Counterpunch: A 35B Model on a 12GB GPU

At the very same moment, the exact opposite is unfolding.

DeepSeek V4 has released its FP4 quantization technique (QAT). Quantization lowers the numerical precision of an AI model to shrink its memory footprint. A smaller model can run on a smaller GPU.

Qwen3.6 squeezes more speed out of the same GPU with MTPMulti-Token Prediction​ inference. Instead of generating one token at a time, it predicts several tokens at once.

The combined result of these two techniques is startling. 12GB of VRAM — a single RTX 4070, the kind of card ordinary gamers own. On that hardware, a 35B-parameter model runs at 80 tokens per second. Faster than a person can read.

Just last year, running a 35B model meant a data-center GPU like the A100, at more than $15,000 apiece. Today the same job runs on a gaming card that sells for about 500,000 won — roughly $350.

What this means is clear: the number of people who can run AI is about to explode. Even without the budget to rent a data center or absorb cloud bills, you can run AI directly on your own PC. That era is coming.

What a 40% Plunge in NAND Prices Is Signaling

And then one more intriguing data point arrived. Spot prices for Korean NAND flash fell 40% in a single month — the first reversal signal since the AI memory supercycle narrative began.

The simplest reading of falling NAND prices is that supply is outrunning demand — the exact opposite of the story that AI data centers are vacuuming up memory.

There are two possible interpretations.

First, an early warning of overinvestment in AI infrastructure. Memory makers ramped up production aggressively on the assumption that everyone was building AI data centers, but actual data-center buildout may not be growing nearly that fast. It's a decoupling between asking prices and real demand.

Second, a signal that AI is evolving to consume less memory. As efficiency techniques like FP4 quantization become standard, the memory needed to run a given model drops by more than half. Data-center operators can do the same work while buying less memory.

Either way, the simple scenario — "AI arrives, memory demand grows without limit" — is breaking down. NAND prices are the market's first visible crack.

Surviving the Sandwich

Here is the market structure in plain terms.

The top: Nvidia, OpenAI, and Big Tech. They monopolize capital and infrastructure and build closed ecosystems. Every API call has a price, and that money flows up to Big Tech.

The bottom: open source plus efficiency technology. A 35B model on a 12GB GPU. AI on your own machine — no API calls, no cloud dependency.

The middle: everyone and every company stuck in between. Depend on Big Tech APIs and the costs keep climbing; building your own infrastructure takes capital you don't have; running open source takes engineers you can't hire.

The way to survive this structure is to work both ends of it.

Strategy 1: Use Big Tech APIs only where they create core value. Don't throw every task at Claude or GPT — reserve them for work that genuinely demands reasoning power. Handle simple classification, summarization, and keyword extraction with open-source models. Costs drop to a tenth.

Strategy 2: Exploit the falling barrier to running your own infrastructure. A year ago, in-house AI infrastructure was big-company territory. Today, gaming-PC-class hardware can do meaningful work. Even a small company can run its own AI system that never sends sensitive data outside its walls.

Strategy 3: Spread your dependencies. Tie yourself to a single Big Tech API and you're exposed to every price hike, policy change, and service outage. Use Claude, GPT, and Gemini for different kinds of work, and run some of it yourself on open source. This is the efficiency path — one of the "two roads of the AI era" that David George described.

Why Robotics Is Moving First

On this same day, another notable announcement landed. Nvidia unveiled its GR00T robotics model, and a study titled "LLMs corrupt documents" was published. That these two events happened on the same day is telling.

The finding that LLMs corrupt documents is another reminder that text-based AI agents have hard limits on reliability. There is still a great deal to verify before office work can be handed over to AI.

Meanwhile, robotics models are advancing fast. It's a signal that in physical work — manufacturing, logistics — AI may replace human labor faster than it does at the desk. Before the reliability problems of text are solved, the domain of physical action is opening up first.

This current bears directly on Korea's manufacturing and logistics industries. Office automation is arriving slowly, but physical automation may arrive sooner than anyone expects.

The Signals to Watch

Putting today's news together, three signals stand out.

First, AI infrastructure investment has become a capital game. Forty billion dollars is not a scale any individual company can match. The race at the hardware layer is effectively over. The question for the rest of us is what to build on top of it.

Second, efficiency is the new differentiator. While Big Tech locks the top with capital, open source is prying open the bottom with efficiency. For any company bringing AI into its business, the ability to build a cost-efficient operating structure becomes a core competency.

Third, the AI memory supercycle is not forever. The 40% NAND price plunge may be the first sign of a trend, not a one-off event. The era of betting blindly on AI infrastructure has passed. What's needed now is the judgment to tell which segments are genuinely growing and which are overbuilt.

A sandwich structure: Big Tech sealing the top, open source digging in from the bottom. The moment has come to decide where in that structure we will stand. Climbing to the top to join the capital game isn't an option. That leaves two paths: carve out a position at the bottom with efficiency as your weapon, or combine both ends cleverly and fill the space in between.

Today's market is telling us the time to find that path is running out.