Like, it literally creates answers out of thin air then sells it as if it's correct. It doesn't even try to get it right. What sort of redundancy is there in analyzing if the answer is correct before spewing it out? I thought LLMs were supposed to discern what the best answer is given what was said to it based on its training, yet it'll give answers that don't exist based on any training. It's not like it learned the wrong answer from a Reddit post and just posted what Reddit said. It legit is making up wrong answers then citing correct answers. It just outright gets it wrong almost on purpose.
Anyone understand why LLMs fail so much?
I understand they run correlations but how does it determine a wrong answer is the most correlated to the correct response given the prompt instead of the actual correct answer...
Because it's not answering your question. Is deploying a million calculations to determine what a gramatically correct answer to your question would look like with relevant keywords.
Its like most of the language paradoxes in English. "Can got make a rock song big he can't lift it" isnt a testament against god, it's an observation that English can construct a gramatically correct sentence that has no meaning. In much the same what that "horse magnets are correct" isnt meaningful, despite being a valid sentence.
Computers work by using 10 trillion switches. The switches activate in the situations where their trigger is present. The switches only have "yes" and "no" the switches are descriptors "blue" "vegatable" "french" given a large enough calculation you can represent anything given enough " it isnt" phrases
The difficulty you have is that as a human you have heuristics and intuition that makes that many calculations unnecessary for you. You can see an apple is an apple because you can see it. A computer has to define all the space in the universe that ISNT "apple" to describe it.
So think of the staggering number of calculations required to do that, and apply it to language. Thats what an LLM is. It returns you the computer equivalent of infinity monies typing for nearly infinite time, and then returning what it recognizes as the most likely response to your question.
If you asked "what color is the sky?" It individually calculates the likelihood that when that text occurs in that order, the text that follows would say "green" "banana" "embezzelment" "blue" "lkjh" "asdf" or "2,437" and then picks the one with the highest score.
Every word in its database is weighted on a graph with 10,000 axis, indicating what kind of word it is, and where it should fit in text. The values in that matrix corrospond to those switches we brought up earlier. Thus what it produces, is just a arbitrarily large number of calculations, run through a random letter generator, and picking the option that most closing resembles it's training data.