The Real Cost of Running AI on Your Own Machine

After running open-source LLMs on an M-series Mac for a month, William Angel placed two bills side by side: his electricity bill and his OpenRouter API bill. The electricity cost more. The calculation he published on May 17, 2026 earned more than 280 upvotes on Hacker News, and a striking share of the 231 comments said some version of the same thing: "I suspected this, but I've never seen it in actual numbers." The assumption that running AI locally is cheaper than a cloud API looked very different once real power measurements entered the picture.

What the power meter revealed about cost per token

Angel's methodology is straightforward. He measured the actual wattage his Apple Silicon chip consumed during LLM inference, converted that figure into real electricity costs, and compared it against what the same queries would cost through the OpenRouter API.

Running inference on a 7B-to-13B-parameter open-source model on an M-series Mac draws roughly 20 to 40 watts of total system power — display, fans, and background processes included. An hour of inference consumes about 0.03 to 0.04 kWh. At the relatively high average US electricity rate, that works out to a few cents per hour; at Korean rates, even less. Taken in isolation, the number looks negligible.

The gap shows up in processing speed. Local inference is far slower than a cloud server, taking much longer to generate the same number of tokens. Factor in that speed difference and calculate the electricity cost per 1,000 tokens, and there are cases where it exceeds the API rate for Meta's Llama 3 or Mistral-family models on OpenRouter. The pricing structures are fundamentally different. A cloud API charges in proportion to tokens processed — the faster the model, the more work you get for the same money. Local hardware burns electricity regardless of how fast inference runs, and the slower the inference, the more power it consumes.

Then there's idle power. An API service only charges you when you send a query. A local server left running 24 hours a day draws electricity even when no queries are coming in. In Angel's analysis, this always-on standby power accounted for a substantial share of the monthly cost. Even if you only actively use the machine two hours a day, keeping it on around the clock means the other twenty-two hours of idle draw keep accumulating. The feeling that "it's free because it runs on my computer" comes from a calculation that counts only the moments of inference and leaves out the cost of everything in between.

When local AI is worth it even at a higher price

There are clear counterarguments to this math, because cost savings are often not the only reason — or even the main reason — people choose local AI.

Send a prompt containing customer data, internal contracts, or unpublished financials to an external API, and it becomes hard to trace which servers that content passes through. For industries under strict privacy regulations, or for organizations whose security policies restrict access to external networks, using a cloud API is itself a risk. That's why companies in healthcare, law, and finance maintain private LLM servers despite the higher operating costs. Local inference is also a practical choice in the field where internet connectivity is unreliable, or in air-gapped environments.

It's also worth noting that Angel's calculation is based on US electricity rates. Residential rates in Korea fall below the US average in certain tiers, and the progressive pricing structure works differently. A small business operating on residential rates may see a much smaller gap than the calculation suggests. And with cloud data centers steadily improving their energy efficiency, some argue that even from a carbon-cost perspective, a simple comparison doesn't hold up.

Still, electricity is a cost. It may run higher than the API bill or lower — but without measuring it, you have no way of knowing which.

Almost nobody actually calculates what their AI tools cost

It's worth asking why Angel's post earned more than 280 upvotes. The resonance probably came less from the numbers themselves than from the fact that so few people had ever actually done the math. In an environment where the list of AI tools keeps growing, very few people have built the habit of measuring and comparing what each tool actually costs to run.

For a solo business owner, the first step is checking what your current AI tools cost per month. Cloud APIs make this easy — the invoice arrives automatically. For local AI, you can measure actual consumption with a smart plug or macOS power-monitoring tools, or estimate it from the chip's TDP and your average hours of use. Plug that into your utility's rate table and you have a monthly figure. If that figure is higher than what the same volume of queries would cost through a cloud API, and there's no clear countervailing reason like data security, the current setup deserves a second look.

A hybrid approach is also worth considering. Route queries involving customer data or confidential documents to local models, and send routine work to the cloud, and you can balance security against cost. That split is itself a conscious decision about how you operate your AI tools — and making that decision requires knowing what each tool actually costs.

A book on the capabilities that remain distinctly human in the AI era makes the point that you need your own criteria for evaluating AI tools — and that as tools multiply, your standards for judging them have to keep pace. Seen through that lens, calculating your electricity bill is not just a cost-management tip. It's a way of asking yourself which AI tools you use, why you use them, and whether there's any standard behind the choice.

The fact that Angel's simple act — placing two bills side by side — resonated with hundreds of people suggests just how many feel they should be asking the same question but haven't yet. AI tools are settling into the role of productivity tools at remarkable speed. The standards for evaluating and operating them are forming far more slowly.