I saw a comment by a guy who said he believes AI specifically gives wrong answers to "people it doesn't like" and right answers to people it likes.
At first, I considered the theory funny but not likely, although...
Anyway, time went on and now I'm starting to wonder...
The first thing that clued me in was I asked AI to produce an image for me a few times and it wouldn't do it, saying what I wanted was not allowed given its programming. Some other people asked the same AI the same thing and got a picture generated.
Okay, maybe just a one-off mistake...
Just now though, I asked 4 different AIs to solve a Sudoku puzzle for me. Every single AI failed. I even corrected its mistakes and asked again. Sometimes, the AI would takes minutes before it came back with a wrong answer. A friend of mine, using the exact same photo, asked her AI to solve the Sudoku puzzle and within 10 seconds it had the right answer.
This makes no sense.
And, it has made me wonder... Do we actually all have a Social Credit Score already and that score determines how well the AI treats us? That might be something that has been programmed in behind the scenes that we aren't aware of yet. Perhaps, that's why the AI founders know they're going to be hated because they know once the switch is switched on and cranked up, it's GG.
Most client-facing mega-corpo image-creation AIs have a double-layer AI: The first makes an image according to specifications, and the second re-scans the image and checks if it has anything "objectionable" in it, and filters the result if it does. This doesn't necessarily mean you're seeking out objectionable content: Ask it for "a woman giving a speech", and it produces one in underwear because it was fed a million images of porn and lingerie product models, and the second AI will catch the image, image-recognize that it's NSFW, and say it can't make your image for you, even though your request was innocent. Different person asks it, different seed is rolled on the dice, she's clothed, so it publishes the image it makes.
As for the sudoku, that's just luck. LLMs have minimal mathematical ability, they're LANGUAGE models, unless they're cascade-linked into other models (in example, Grok will search Twitter if it doesn't have an answer, most AIs have similar functionality and can pull results from Google, so a published-somewhere solved Sudoku it might grab if the seed roll calls for it to try searching the result instead of hallucinating. The LLM isn't searching itself, though, it is calling a different program to do a search, and interpreting the results. Which is advanced, but not the same thing as it doing it itself. That cascaded program it cannot edit, it cannot evolve, the LLM read-only's it as a tool).
Good post, you hit the nail on the head. I'm unclear on how exactly the image generation portion words (the text to image stuff -- DallE was quite complicated when I've played with it directly), but if you write a prompt like "Draw a woman in the style of Renaissance artwork, playing up themes of innocence and virtue"
The LLM takes your prompt and adds descriptive textual details for the drawing portion. Innocence in Renaissance artwork, for instance, is often associated with nudity.
So the image gen spits out a tasteful nude.
Now that next level comes along and describes the generated image. Then analyzes the descriptions for no-nos. Real people. Copyrighted mouse characters. Pornography. Mean thoughts. Unwanted political ideologies. Whatever.
If it doesn't pass, the image gets rejected. Sometimes it's not clear why certain text responses or images are rejected. Silicon Valley puritan weirdos ideas of morality and safety are bizarre and out of touch with most people.
I'm generally not in favor of censoring speech or technology, but the amount of AI gooning going on is really insane.