An LLM's concept of "well substantiated" is that it has seen something said a lot. It has no logical faculties to determine whether something is likely to be true or not, it only has linguistic frequency data and experience of people's responses to claims. So it can only appeal to popularity and authority, not reason.
Ok, interesting. Limiting the dataset to /pol/ arcives would definitely avoid the porn spamers who ruined /b/ and a few other boards. Still, I'm surprised 2016-2019 worked out so well, that's when the demoralization shills got up to speed and a ton of normie newbies showed up. Guess those of us arguing with the shills made a good impact on the model. I bet a model fed on 2013-2016 archives would be even better.
An LLM's concept of "well substantiated" is that it has seen something said a lot. It has no logical faculties to determine whether something is likely to be true or not, it only has linguistic frequency data and experience of people's responses to claims. So it can only appeal to popularity and authority, not reason.
No wonder they avoid 4chan.
4chan was ruined by shils and bots long ago. An LLM trained on the current boards would just spam demoralization and porn all over the place.
It was actually done and found to be more truthful than any others. https://youtube.com/watch?v=efPrtcLdcdM
Ok, interesting. Limiting the dataset to /pol/ arcives would definitely avoid the porn spamers who ruined /b/ and a few other boards. Still, I'm surprised 2016-2019 worked out so well, that's when the demoralization shills got up to speed and a ton of normie newbies showed up. Guess those of us arguing with the shills made a good impact on the model. I bet a model fed on 2013-2016 archives would be even better.
This isn't unique to LLMs: it matches the heuristic that most humans use as well.
So it's Wikipedia.
Yes