Tech Tuesday: Why AI can't do math

Tech Tuesday: Why AI Can’t Do Math (And Why Your Recipe Just Got 2× Saltier)

I was chatting online with some colleagues on LinkedIn and needed some quick research and a math check. Clicking a browser window that happened to be opened to Bing, I started with a basic question:

“What would be the U.S. corporate income tax on $18 billion in profits?”

This isn’t a trick question. It’s simple percentage math. At a 21% tax rate, the correct answer is: $3.78 billion. But Bing/Copilot told me something else entirely: “$36 billion.”

That number is impossible. It isn’t a small miss or a rounding issue. It’s more than double the actual profit. It caught me off so much, I immediately shared it: Microsoft AI blows it. Again....

At that point, I wanted to understand two things:

Is Bing the only AI getting this wrong?
Why would an AI miss something this basic?

So I took Bing’s answer — the wrong answer — and handed it to three other AI models: ChatGPT 5.1, Gemini, and Grok. And I didn’t just ask them for the correct calculation. I asked them to evaluate Bing’s result and explain what went wrong.

All three immediately spotted the error and explained why the number made no mathematical sense. But Grok did something extra. It offered a follow-up question:

“Why do AI models hallucinate math?”

I clicked it.

The explanation that followed is something most people have never been told clearly. It was something I didn't really understand at all. And it is something that changes how we should use AI everywhere numbers matter — including our kitchens, our banking, our medical reports... our lives.

AI Is Not a Calculator

When you ask a large language model a math question, it does not switch into “calculator mode.” It doesn’t perform arithmetic, carry digits, or verify anything. Instead, it does what it was built to do:

It predicts the next piece of text that seems likely based on patterns it has seen.

That means when you ask for “21% of $18 billion,” the model is not performing a step-by-step calculation. It’s trying to produce an answer that looks like the kind of answer it has seen before. Sometimes that gets you close. Sometimes it gets you “$36 billion.”

Confidence level: 100%. Accuracy level... not so much

This is how the system is designed.

One way to picture this is to imagine trying to solve a math problem on Wheel of Fortune. The board shows a few numbers, some blanks, and you’re guessing what the full equation might be. You’re not calculating — you’re trying to match a pattern. That’s exactly what a language model does.

Why AI Breaks on Simple Math

Once you know how these models work, their math mistakes stop being mysterious. A few key points explain most of the failures:

1. No math engine — just language

The model’s job is to predict tokens (pieces of text), not numbers. It does not truly “do” arithmetic at all. It just predicts what the answer might look like in text form.

This is not an exaggeration: Large Language Models can’t add. They do not have the ability to do arithmetic

2. Each digit is guessed independently

The number 3.78 is not one thing to the model. It’s four separate predictions:

3 → . → 7 → 8

If the model slips on one of those predictions, the entire number changes. 3.78 can become 37.8, or 4 million can turn into 4 billion, just from one wrong guess in the sequence.

3. The internet is full of bad math

These models learn from massive amounts of online text. That text includes correct math, wrong math, clickbait headlines, sloppy comments, and everything in between.

If you think about how often humans mess up “simple” numbers, you won’t be surprised that models trained on our writing learn those mistakes, too.

4. No built-in “does this make sense?” check

The model does not stop and ask itself:

“Should 21% of this number be larger than the original?”
“If I divide my answer by 18 billion, do I get something close to 0.21?”

It simply keeps writing. Confidence and correctness are not the same thing for an AI model.

A Quick Note About Tokenization

There is a deeper technical reason behind many of these errors called tokenization — the way AI breaks long numbers like 18,000,000,000 into smaller pieces before working with them.

Tokenization makes large numbers even harder to handle correctly and is a big part of why models lose track of zeros and scale.

That topic deserves its own space, so we’ll cover it in a future Tech Tuesday article focused entirely on how tokenization works and why it matters.

Why This Matters in the Kitchen

Most of us don’t think much about math when we cook, but it’s there all the time. Here are a few places it shows up:

Scaling a recipe from 4 servings to 7
Converting grams to tablespoons or cups
Doubling or halving spice amounts
Adjusting cooking times when you change pan size or batch size
Estimating cost per serving
Figuring out calories or macros for a meal
Keeping ratios steady, like oil-to-vinegar in a dressing

If you use AI to help with any of those, you are trusting it with math. And if the model “solves” the problem by guessing instead of calculating, the results can drift fast. Maybe your dinner comes out saltier than planned. Maybe your cost estimate is way off. Maybe your “healthy” bowl isn’t as healthy as you think.

The same failure mode that turned 21% of $18 billion into $36 billion can quietly change your food, too.

A Sous-Chef Who Guesses

Here’s a simple way to picture how these systems behave in the kitchen. Imagine a sous-chef who:

Has read thousands of recipes
Understands techniques and flavor pairings
Can suggest creative substitutions
Can help you plan a full menu quickly

But there’s a catch: this sous-chef has never really learned to measure accurately.

When you say, “Double this recipe,” they don’t calculate anything. They look at the bowl, think about what “doubling” usually looks like, and make their best guess.

Sometimes that guess is close enough. Sometimes it isn’t.

That’s how a language model behaves. It is often helpful, especially with ideas and planning. But it is not a precision instrument for arithmetic.

How AI Companies Are Trying to Fix the Math Problem

The good news is that this weakness is well-known, and there is active work underway to reduce it. A few of the main approaches include:

Tool calling. The model recognizes a math request and sends it to a real calculator or a small Python engine. The calculator does the math; the model explains or formats the result.
Training models to show their work. Some systems are trained on step-by-step reasoning, so they “write out” the logic instead of jumping straight to an answer. This can make mistakes easier to spot and sometimes reduces them.
Better handling of numbers. Newer designs are experimenting with treating numbers more like whole units instead of random text fragments, which can help with consistency.
Dedicated math components. Some setups use a smaller, specialized math model alongside the main language model, so math questions are routed to something built to handle arithmetic reliably.

This is likely part of why ChatGPT, Gemini, and Grok answered correctly in the original example — they used stronger reasoning paths or math tools behind the scenes. Bing/Copilot, in that moment, did not.

How to Protect Yourself When Using AI in the Kitchen

You don’t need to stop using AI in the kitchen. You just need to be clear about where it helps and where it doesn’t. Here are some practical guidelines:

Double-check math that actually matters. If AI is helping you scale recipes, convert units, or estimate nutrition, verify the numbers with a calculator, a reliable website, or a trusted reference.
Ask AI to show its steps. When you request math, add a phrase like “show the steps” or “walk through the calculation.” If the explanation looks shaky, don’t trust the final number.
Be explicit when you need accuracy. Use a prompt like: “Use a calculator for this calculation and then show me the result.” Models that support tool calling will often route your request to something more reliable than guessing.
If a number feels wrong, pause. If you see something that doesn’t look right — a cooking time that jumps too high, a measurement that seems huge, a cost that feels unrealistic — assume the AI got it wrong until you confirm it.
Use AI for creativity, not for precision. Let AI shine where it is strong: brainstorming recipes, planning menus, suggesting substitutions, organizing prep lists, and helping you think through options.

When it comes to exact numbers, measurements, and calculations, use tools designed for that job.

Closing Thought

AI gives us speed, ideas, and new ways to explore cooking. It can be a fantastic partner in the kitchen. But we need to be honest about its limits.

Language models are built to work with words, not numbers. When you ask them to do math, they are still just guessing what text comes next.

That’s why a model can turn a simple tax question into a $36 billion error — and why it can just as easily over-salt your soup or mis-size your family dinner if you trust it too much with the math.

The solution isn’t to abandon AI. It’s to use it like any other tool: for the jobs it does well, and not for the ones it doesn’t.

Creative Cooking with AI

Search This Blog