Like, it literally creates answers out of thin air then sells it as if it's correct. It doesn't even try to get it right. What sort of redundancy is there in analyzing if the answer is correct before spewing it out? I thought LLMs were supposed to discern what the best answer is given what was said to it based on its training, yet it'll give answers that don't exist based on any training. It's not like it learned the wrong answer from a Reddit post and just posted what Reddit said. It legit is making up wrong answers then citing correct answers. It just outright gets it wrong almost on purpose.
Anyone understand why LLMs fail so much?
I understand they run correlations but how does it determine a wrong answer is the most correlated to the correct response given the prompt instead of the actual correct answer...
AI Director and AI Independant Researcher here.
Because an LLM is fundamentally trained to predict the most probable next token, it does not actually “know” whether a statement is true or false. Its objective during training is not factual correctness, it is statistical likelihood given the text it has seen.
When the model generates an answer, it is essentially estimating:
Probability of (next token given previous tokens)
This means it will produce text that looks plausible within the patterns of language it learned, even if the information is incorrect. There are a few reasons this leads to hallucinations:
The model optimizes for what words tend to follow other words, not whether the statement is factually correct. If a pattern appears believable in language, the model may generate it even if it is wrong.
Training data is finite and frozen at a certain time. If the model has limited examples of a topic, it may interpolate from related patterns and generate something that sounds reasonable but isn’t accurate.
The model is designed to always continue the sequence unless explicitly instructed to stop. If it does not actually know the answer, it may still generate one because predicting something is part of its objective.
Transformers are extremely good at combining patterns. Sometimes they merge multiple partially related concepts into a response that is grammatically correct but factually incorrect.
A language model doesn’t retrieve facts the way a database does, it generates text that statistically fits the context, which is why it can sometimes produce convincing but incorrect information.
There is no concept of "correct", just what word should follow the last word.
Thanks. Great response.
Explain how they are able to do math and spatial reasoning.
ecognition: LLMs can often reproduce arithmetic or algebraic manipulations if they appear frequently in the training data. For example, they can compute 2 + 3 = 5 or symbolically solve simple linear equations.
When prompted to “show your work,” LLMs can sometimes emulate a logical sequence of steps in a calculation, mimicking the kind of reasoning a human might write down.
LLMs can recall formulas, rules, and common mathematical facts that they are trained on.
But LLMs do not “compute” numbers in the way a calculator does. They generate numbers based on patterns, so mistakes accumulate with larger numbers or complex operations. For example, asking it to compute 234 * 567 may result in a wrong number because the model predicts what looks plausible rather than calculating precisely.
If it tries to break it down into a multi-step process, these are immensely error-prone, as the model doesn’t track intermediate results reliably.
Anything that requires abstraction, they will struggle with. Examples like proofs, higher-dimensional algebra, and precise symbolic manipulations.
This is because LLMs encode statistical correlations between tokens. They don’t internally maintain the concept of a number as a manipulable object, they only know how numbers “look” in context.
It doesn know that 2 is 2, it just knows that 2 comes after 1, turns 10 into 12 and so on and so on.
Spatial reasoning often requires a continuous, structured mental model (a 3D coordinate system). LLMs operate in a discrete token space, which is poorly suited for inherently geometric problems. They can simulate reasoning through learned text patterns but cannot “visualize” in the human sense.
They can understand and generate text describing spatial relationships, like “The cup is on the table, and the book is next to it.” For example, “left of,” “right of,” “above,” “below” can be tracked in sequences reasonably well.
LLMs lack an internal geometric or visual model of space. They cannot mentally rotate objects, imagine perspectives, or simulate physics accurately. They also cannot reliably generate or manipulate grids, matrices, or plots without structured guidance.