Like, it literally creates answers out of thin air then sells it as if it's correct. It doesn't even try to get it right. What sort of redundancy is there in analyzing if the answer is correct before spewing it out? I thought LLMs were supposed to discern what the best answer is given what was said to it based on its training, yet it'll give answers that don't exist based on any training. It's not like it learned the wrong answer from a Reddit post and just posted what Reddit said. It legit is making up wrong answers then citing correct answers. It just outright gets it wrong almost on purpose.
Anyone understand why LLMs fail so much?
I understand they run correlations but how does it determine a wrong answer is the most correlated to the correct response given the prompt instead of the actual correct answer...
Your first three sentences describe it perfectly. The coin analogy isn't great because the problem isn't random noise occasionally breaking the model with abnormal inputs. The problem is perfectly normal inputs breaking the model regularly because the model wasn't trained with inputs in that "range". I used the example of a weather model going haywire because of abnormal inputs, but it's more like trying to predict the weather on Neptune using a model trained to predict the weather on Earth.