Note/»AI in LLMs« — Absent Intelligence in Large Language Models

11 Dec 2024 — Igor Böhm

TODO: Cade Metz, Can A.I. Generate New Ideas?, The New York Times, Jan.14, 2026. — »Systems like OpenAI’s GPT-5 are accelerating research in math, biology and chemistry. But there is a debate over whether it can do that work on its own.«

This collection of random notes, quotes, and articles is work in progress…

Acronyms

A Large Language Model (LLM) analyses vast pools of information, to “learn” the statistical relationships between words and phrases.

A Large Multimodal Model (LMM) is an AI system capable of processing and generating content across multiple data types, or modalities, such as text, images, audio, and video.

Quotes

»Modern-AI works: not by reasoning logically, but by using statistical techniques to produce the most likely answer, based on an enormous training dataset.« — Betrand Meyer ¹

»[The artificial intelligence chatbot ChatGPT] is going to change everything about how we do everything. I think that it represents mankind’s greatest invention to date. It is qualitatively different — and it will be transformational.« — Craig Mundie, former Chief Research and Strategy Officer for Microsoft ²

»To observe an A.I. system — its software, microchips and connectivity — produce that level of originality in multiple languages in just seconds each time, well, the first thing that came to mind was the observation by the science fiction writer Arthur C. Clarke that “any sufficiently advanced technology is indistinguishable from magic.”« — Thomas L. Friedman ²

»It was observed centuries ago that the normal use of language has quite curious properties, it is unbounded, its not random, it is not determined by external stimuli and there is no reason to believe that it is determined by internal states, it is uncaused but its somehow appropriate to situations, its coherent, it invokes thoughts in the hearer that he or she might have expressed in the same way.« — Noam Chomsky (1992 Killian Lecture)

»So there is a whole school of linguistics that comes from Chomsky that thinks that it’s complete nonsense to say [large language models] understand, that they don’t process language at all in the same way as we do. I think that school is wrong. I think it’s clear now that neural nets are much better at processing language than anything ever produced by the Chomsky School of Linguistics. But there’s still a lot of debate about that, particularly among linguists.« — Geoffrey Hinton (2024) ³

»It’s true there’s been a lot of work on trying to apply statistical models to various linguistic problems. I think there have been some successes, but a lot of failures. There is a notion of success … which I think is novel in the history of science. It interprets success as approximating unanalyszed data.« — Noam Chomsky (2011) ⁴

TODO: add Peter Norvig and Goeffrey Hinton quotes lavishly praising AI models.

TODO: add quotes from Breiman ⁵ about »The Two Cultures« in statistical modelling.

Notes on »The False Promise of ChatGPT« ⁶

TODO: Summarise and extrapolate key problems identified in ⁶

»Perversely, some machine learning enthusiasts seem to be proud that their creations can generate correct “scientific” predictions (say, about the motion of physical bodies) without making use of explanations (involving, say, Newton’s laws of motion and universal gravitation). But this kind of prediction, even when successful, is pseudoscience. While scientists certainly seek theories that have a high degree of empirical corroboration, as the philosopher Karl Popper noted, “we do not seek highly probable theories but explanations; that is to say, powerful and highly improbable theories.”«

»The theory that apples fall to earth because that is their natural place (Aristotle’s view) is possible, but it only invites further questions. (Why is earth their natural place?) The theory that apples fall to earth because mass bends space-time (Einstein’s view) is highly improbable, but it actually tells you why they fall. True intelligence is demonstrated in the ability to think and express improbable but insightful things.«

Notes on »GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models« ⁷

TODO: Summarise and extrapolate key problems identified in ⁷

»Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely used to assess the mathematical reasoning of models on grade-school-level questions. While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics. To address these concerns, we conduct a large-scale study on several SOTA open and closed models. To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions. GSM-Symbolic enables more controllable evaluations, providing key insights and more reliable metrics for measuring the reasoning capabilities of this model. Our findings reveal that LLMs exhibit noticeable variance when responding to different instantiations of the same question. Specifically, the performance of all models declines when only the numerical values in the question are altered in the GSM-Symbolic benchmark. Furthermore, we investigate the fragility of mathematical reasoning in these models and show that their performance significantly deteriorates as the number of clauses in a question increases. We hypothesize that this decline is because current LLMs cannot perform genuine logical reasoning; they replicate reasoning steps from their training data. Adding a single clause that seems relevant to the question causes significant performance drops (up to 65%) across all state-of-the-art models, even though the clause doesn’t contribute to the reasoning chain needed for the final answer. Overall, our work offers a more nuanced understanding of LLMs' capabilities and limitations in mathematical reasoning.« ⁷

Notes on Peter Norvig’s »Chomsky and the Two Cultures of Statististical Learning« ⁸

TODO: Review Norvig’s rant ⁸ and cite Chomsky’s comment to Norvig’s fulminations from the Q&A of ⁹

Notes on Betrand Meyers »AI for software engineering: from probable to provable« ¹

Why do we see CEOs force their engineers to use AI for work or else? ¹⁰ ¹¹

If AI 🤖 is so great, wouldn’t workers want to use it without being forced into it? Or could it be that it is only great for a small segment of corporate staff, i.e., those that hope to improve the operating margin to increase profit for shareholders…https://t.co/oDVqdjGiuv pic.twitter.com/SXu1CwIJae
— 1g0r.B0hm (@1g0rB0hm) November 8, 2025

»America’s bosses are getting blunt about the reality that AI leads to job cuts. The standard warning goes something like this: If a bot doesn’t replace you, a human who makes better use of AI will.«https://t.co/rnLGCfo6Lk pic.twitter.com/R3ipw4N3LD
— 1g0r.B0hm (@1g0rB0hm) November 8, 2025

TODO: Betrand Meyer’s paper ¹ has some excellent insights into why »the much-touted use of AI techniques for programming, faces two overwhelming obstacles: the difficulty of specifying goals (“prompt engineering” is a form of requirements engineering, one of the toughest disciplines of software engineering); and the hallucination phenomenon. Programs are only useful if they are correct or very close to correct.« ¹

»AI for software engineering: from probable to provable« — Bertrand Meyerhttps://t.co/ex84JISuNy pic.twitter.com/fmiS3vsFjc
— 1g0r.B0hm (@1g0rB0hm) November 4, 2025

Notes on the application and use of LLMs/LMMs

The following is from a LinkedIn post by Simon Wardley:

All outputs are hallucinations i.e. fabricated and ungrounded. Many of these outputs happen to match reality when there’s abundant training data and repetition, so they look useful on common tasks. But they cannot do research. These machines are stochastic parrots (Bender et al), they are pattern matchers and not reasoning engines.

These systems will happily invent plausible seeming but unverified detail. That’s a design feature not a bug, they are optimised for coherence, not truth.

These systems do not understand what they are creating. The use of tools and guardrails is mostly to convince you of their correctness and to hide their inner workings, they are about shaping perception and behaviour, not true comprehension. Yes, guardrails also reduce some classes of harm.

These problems are not with the user and their prompting. Stop blaming users for what are design flaws and systematic issues.

You cannot “swarm” your way out of these problems. Orchestration doesn’t solve fundamental epistemic limits. However, these systems (including agentic swarms) are extremely useful in the right context and are excellent for creating hypotheses (which then need to be tested).

These systems can output long, convincing “scientific” documents full of fabricated metrics, invented methods, and impossible conditions without flagging uncertainty. They cannot be trusted for policy, healthcare, or serious research, because they are far too willing to blur fact and fiction.

These systems can and should be used only as a drafting assistant (structuring notes, summarising papers) with all outputs fact-checked by humans that are capable in the field. Think of these systems as a calculator that sometimes “hallucinates” numbers - it should never be blindly trusted to do your tax return.

The persuasive but false outputs can cause real harm. These systems are highly persuasive and are designed to be this - hence coherence, the appearance of “helpfulness” and the use of authoritative language.

Being trained on market data, these systems exhibit large biases towards market benefit rather than societal benefit. Think of it like a little Ayn Rand on your shoulder whispering sovereign individual Kool-aid. In other words, the optimisation leans toward market benefit, not necessarily public good.

– Appendix

Many use the term hallucination as “error from reality”. This implies that the LLM/LMM reasons its way to the correct answers. I take a position that all output is “hallucinated” and sometimes that output matches reality where we have lots of training data and narrow contexts. I feel this fairly reflects the more statistical nature of LLM/LMMs as we haven’t built reasoning engines … yet.

AI conglomerates are too big to fail and therefore national security depends on them in order to justify lavish tax cuts (OpenAI as a non-profit), DARPA spending, as well as direct subsidies. Public risk, private profit. pic.twitter.com/WOsUVxSvye
— 1g0r.B0hm (@1g0rB0hm) November 4, 2025

»Nearly three years after the start of the artificial intelligence boom, business technology leaders are starting to change their thinking on return on investment. The new wisdom? Don’t worry so much about AI’s ROI.« (https://t.co/y3tyljhF83) With the EU pledging to »mobilise… pic.twitter.com/NNiH7TiKwD
— 1g0r.B0hm (@1g0rB0hm) September 19, 2025

Time to reap the benefits—i.e., rip off the public; as in the past, the costs and risks of the coming phases of the high-tech economy were to be socialized, with eventual profits privatized.https://t.co/eO8pcEh1fa pic.twitter.com/6GnGWloFrg
— 1g0r.B0hm (@1g0rB0hm) September 12, 2025

The one time when AI, i.e., a Large Language Model (LLM), could help sift through a lot of data to provide a first pass analysis and categorization, the Justice Department relies on 400 lawyers that are likely then not available for other tasks at the Justice Department. Is it… pic.twitter.com/jTY20bmiHw
— 1g0r.B0hm (@1g0rB0hm) December 31, 2025

The current AI bubble is 4 times larger than the 2008 real-estate bubble and banks “are willing to go after more aggressive profit targets”!? pic.twitter.com/isl00pXj3x
— 1g0r.B0hm (@1g0rB0hm) December 12, 2025

Last ditch effort to milk the AI bubble…https://t.co/DaQoPWBrX9 pic.twitter.com/8Mjbn4HvkL
— 1g0r.B0hm (@1g0rB0hm) October 31, 2025

If AI 🤖 is so great, wouldn’t workers want to use it without being forced into it? Or could it be that it is only great for a small segment of corporate staff, i.e., those that hope to improve the operating margin to increase profit for shareholders…https://t.co/oDVqdjGiuv pic.twitter.com/SXu1CwIJae
— 1g0r.B0hm (@1g0rB0hm) November 8, 2025

»America’s bosses are getting blunt about the reality that AI leads to job cuts. The standard warning goes something like this: If a bot doesn’t replace you, a human who makes better use of AI will.«https://t.co/rnLGCfo6Lk pic.twitter.com/R3ipw4N3LD
— 1g0r.B0hm (@1g0rB0hm) November 8, 2025

Die AI-Blase nähert sich ihrem Ende. Deswegen brauchen wir die Verteidigungs- bzw. Aufrüstungsblase. Aber zurück zu AI 👇https://t.co/O6XqnM7b5G pic.twitter.com/okMDsoeqWn
— 1g0r.B0hm (@1g0rB0hm) October 18, 2025

Kallas is complaining that "Russia has invested a billion euros in their state controlled propaganda outlets". She didn't mention that the EU "mobilised €200 billion in AI investments" as part of InvestAI 🇪🇺 This "AI" technology will in turn be used to automate control of… https://t.co/IHErxnU2dt pic.twitter.com/nas6IPc0WU
— 1g0r.B0hm (@1g0rB0hm) March 20, 2025

Ex-Austria chancellor Sebastian Kurz is part of Dream Security, an Israeli AI cybersecurity startup valued at $1.1 billion after major funding round.

“Based in Tel Aviv and with offices in Vienna and Abu Dhabi, the startup said it has built an artificial intelligence-based AI… pic.twitter.com/Hg4Fbl8mNr
— 1g0r.B0hm (@1g0rB0hm) April 12, 2025

“Unpopular” nicht, sondern eher ziemlich dreist “Opinion”, denn das gibt es schon. Der Steuerzahler unterstützt “bedürftige” Unternehmen mit “MaschinenHirn” nämlich schon mit vielen Milliarden (https://t.co/oJFctULUF8…)👇 Wenn das Unternehmen aber lediglich “more of the same”… https://t.co/jJNM8pmJOf pic.twitter.com/pvNldghXvd
— 1g0r.B0hm (@1g0rB0hm) March 11, 2025

Most funding for AI is taken straight out of the defence budget (e.g. DARPA). In other words, it is subsidised by the taxpayers. Isn’t it rewarding to see how that money is being put to „good use“… https://t.co/q5LRSL06B6
— 1g0r.B0hm (@1g0rB0hm) October 4, 2024

Betrand Meyer, AI for software engineering: from probable to provable, Software Engineering and Artificial Intelligence, 2025. ↩︎
Thomas Friedman, Our New Promethean Moment, The New York Times, Mar.21, 2023. ↩︎
Geoffrey Hinton, First Reactions; Telephone Interview, Nobel Prize, Oct.24, 2024. ↩︎
Noam Chomsky, Comments made at the Brains, Minds, and Machines symposium held during MIT’s 150th birthday party in 2011, Technology Review. ↩︎
Leo Breiman, Statistical Modeling: The Two Cultures, Statistical Science, Vol. 16, No. 3, 199-231, 2001. ↩︎
Noam Chomsky, Ian Roberts, Jeffrey Watumull,Noam Chomsky: The False Promise of ChatGPT, The New York Times, Mar.8, 2023. ↩︎
Iman Mirzadeh and Keivan Alizadeh and Hooman Shahrokhi and Oncel Tuzel and Samy Bengio and Mehrdad Farajtabar, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, Machine Learning and Artificial Intelligence, 2025. ↩︎
Peter Norvig,Colorless Green Ideas Learn Furiously: Chomsky and the Two Cultures of Statistical Learning, Significance, Volume 9, Issue 4, Aug., 2012. ↩︎
Noam Chomsky, Generative Grammar Program Talk CHON-LING019. Recorded in Princeton, NJ on November 12, 2013. ↩︎
Lindsay Ellis, The Boss Has a Message: Use AI or You’re Fired, Wall Street Journal, Nov.7, 2025. ↩︎
Callum Borchers, These AI Power Users Are Impressing Bosses and Leaving Co-Workers in the Dust, The Wall Street Journal, Nov.5, 2025. ↩︎

#Note

Note/»AI in LLMs« — Absent Intelligence in Large Language Models

Acronyms

Quotes

Notes on »The False Promise of ChatGPT« 6

Notes on »GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models« 7

Notes on Peter Norvig’s »Chomsky and the Two Cultures of Statististical Learning« 8

Notes on Betrand Meyers »AI for software engineering: from probable to provable« 1

Notes on the application and use of LLMs/LMMs

Social Media References

Notes on »The False Promise of ChatGPT« ⁶

Notes on »GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models« ⁷

Notes on Peter Norvig’s »Chomsky and the Two Cultures of Statististical Learning« ⁸

Notes on Betrand Meyers »AI for software engineering: from probable to provable« ¹