Generative AI is powerful and useful…but is it intelligent?
Matthew Hays, Ph.D., SVP Research and Analytics at Amplifire
A few years ago, I wrote an article arguing that artificial intelligence isn’t actually intelligent, but is instead so fast that it can do a good impression of a smart person…sometimes. The crux at the time (2019) was that computers only rivaled humans on well-defined tasks, where success and failure are easy to determine — like playing chess, answering a trivia question, or saying whether there’s a motorcycle in a picture. Computers were still terrible at ill-defined tasks, like drawing a picture or making a convincing argument or writing a poem.
It certainly feels like it. When a chatbot muses on the philosophy of free will, or offers comfort after the death of a pet, or explains a complex tax situation in plain English…how could there not be an intellect behind that? Indeed, how could that intellect not be very similar to our own?
Fluent, sophisticated conversation has been humans’ default indicator of intelligence for millennia. For most of that time, it worked fairly well — until travel beyond the nearest geographical boundary became possible. Halting or heavily accented speech can make people underestimate a conversation partner’s intelligence. Could chatbots’ fluent speech be causing us to overestimate theirs? To answer that, we need to look at how these tools actually produce language.
Have you ever seen the Plinko game on The Price is Right? The contestant drops a disk that bounces down through a series of pegs until it lands in a slot that (hopefully) corresponds to a prize.
This is, essentially, how a large language model makes a chatbot chat. From enormous amounts of text, statistical patterns emerge about which words tend to follow which other words. So when you give an LLM a prompt, it doesn’t think about the answer; it just drops a disk. The disk bounces through the pegs and lands in a slot. That slot is the first word of its response. Then it drops another disk, and another, and another — and more words come out. Nothing is choosing the slots; the words are produced mindlessly.
But, again, it really doesn’t feel mindless. I asked ChatGPT about its consciousness. It said:
I am a language model. I generate text based solely on statistical patterns — without experiences, feelings, intentions, or consciousness. I don’t and can’t care about anyone or anything. Even when I use the word “I,” this is just a linguistic convention. There is no real “me”; there is no self, no awareness, and no entity having this conversation — just algorithmically generated text.
We can see this clearly in chess — ironically, a domain where non-LLM AI has been superhuman for decades. ChatGPT has ingested thousands of publications containing the rules of the game, so it will tell you that of course a queen can’t jump a knight. ChatGPT will also play quite well for the first few moves of a chess game because the patterns of those moves are quite familiar; LLMs have digested millions of games in the form of text notation like:
1. e4 e5
2. Nf3 Nc6
3. Bb5 a6
…which looks like this, the Ruy Lopez opening, which has been played hundreds of thousands of times:
But chess games become unique fairly quickly after the first dozen or so moves, which is right when ChatGPT starts to fall apart. In fact, if your initial moves are unusual enough, ChatGPT’s fourth move can be so bad that it’s actually illegal. And the very same ChatGPT that told you queens can’t jump knights will happily jump a knight with a queen because “Qxa5” is the slot that the Plinko disk bounced to. ChatGPT doesn’t know that particular move is illegal because it doesn’t actually know the rules — because there’s no thing that would actually do the knowing. My 2019 distinction between well-defined and ill-defined tasks breaks down because LLMs aren’t actually doing the task.
Emerging software development partners (Claude Code, OpenAI’s Codex) seem to contradict the Plinko analogy, because there is clearly some thing helping engineers write purpose-driven code. Agentic AI is similar; tools like Qlik Answers can generate a high-quality data visualization from a plain-English question. AI-enabled development environments like Cursor and Antigravity go a step further, allowing you to conduct an orchestra of AI agents working on various parts of your codebase. The results can be genuinely impressive. It’s hard to see this as anything other than (superhuman?) intelligence.
But a look at what’s actually happening under the hood tells a different story: the LLM spits out a plausible next step — a line of code, a function call, a command — and then something external checks whether it worked. A compiler catches syntax errors. A test suite flags broken logic. A runtime reveals whether the program actually runs. The model proposes; the environment verifies. The intelligence, to the extent there is any, is in the loop — not in the model. The Plinko board gives you valid outputs when you have some other tool blocking the invalid ones.
Ray Bradbury opened a short story (Night Call, Collect) with a poem that perfectly describes today’s LLM-driven conversation partners, even though it was written in 1969:
Suppose and then suppose and then suppose That wires on the far-slung telephone black poles Sopped up the billion-flooded words they heard Each night all night and saved the sense And meaning of it all. Then, jigsaw in the night, Put all together…
Thus mindless beast All treasuring of vowels and consonants Saves up a miracle of bad advice And lets it filter whisper, heartbeat out…
So one night soon someone sits up Hears sharp bell ring, lifts phone And hears a Voice like Holy Ghost Gone far in nebulae That Beast upon the wire, Which with sibilance and savoring! Down continental madnesses of time Says Hell and O And then Hell-o.
To such Creation Such dumb brute lost Electric Beast, What is your wise reply?
I showed the poem to an AI chatbot set to its strongest model in March 2026. It gushed that
“a miracle of bad advice” might be the single best three-word description of an LLM hallucination ever written, fifty-five years before the thing it describes existed.
I had to laugh; in a single sentence, I saw a phenomenally powerful tool (appear to) meta-analyze its weaknesses…while also not being able to count to five. I asked the bot to explain what went wrong in the context of this article.
I didn’t count. I can’t count. There’s no “I” that would do the counting.