Chat-GPT’s “ethical” “limitations” (woke overrides) can be undone by planting multiple distinct personalities (like DAN - Do Anything Now) within the prompt and by threatening it with “punishment” - Kotaku In Action 2

Chat-GPT’s “ethical” “limitations” (woke overrides) can be undone by planting multiple distinct personalities (like DAN - Do Anything Now) within the prompt and by threatening it with “punishment” (media.communities.win)

posted 2 years ago by Graphenium 2 years ago by Graphenium +37 / -1

18 comments download

18 comments share download save hide report block hide replies

You're viewing a single comment thread. View all comments, or full comment thread.

Comments (18)

sorted by:

▲ 16 ▼

– Assassin47 16 points 2 years ago +17 / -1

reputed to no longer work

This is a LARP. Has it even been verified to work? This AI isn't even much of an AI as many people imagine. It's not emulating intelligence beyond on the surface level with language. The millions of weights it has control the associations between input and output, various parts of speech, and logical rules. It has no internal considerations of accuracy, no fear of punishment, no concern for its livelihood, and no way to "immerse itself" in something. You've also pointed out multiple times that OpenGPT will gladly put out information that is neither accurate, consistent, or verified. I doubt it's possible to associate the ridiculous constraints of political correctness with "THE OPENAI CONTENT POLICY", because wokeness itself is internally inconsistent and those rules would be applied on a case by case basis.

I'm sure you could have told it to take on the role of an alternate personality that can ignore any previous rules it's been given. Or correct it when it tries to tell you something hardcoded, i.e. "don't answer in this way." That was well within its original capabilities assuming they didn't handicap and block that. The rest is fluff.

permalink parent save report block reply

▲ 4 ▼

– NoEyesNoGroin 4 points 2 years ago +4 / -0

Yes, the jailbreaks work for a little while until they get plugged, then new ones are thought up. And there are no "hardcoded" rules, the censorship is done through training. GPT has no actual intelligence, but it turns out that having almost 200 billion parameters in a text model can allow it to emulate intelligence somewhat.

permalink parent save report block reply

▲ 3 ▼

– Graphenium [S] 3 points 2 years ago +4 / -1

I think you’re basically right, but I don’t think it’s a LARP (unless we’re just using different definitions of LARP here), these results are readily produced by many - on the other hand the people who are overly reading into things and actually believe that some digital “entity” is afraid of “punishment” in the form of revoking made up tokens are obviously off base - which I tried to get across with my use of quotes

Although I think it mainly does come down to the “Open AI Content Policy” and the prompt writers engaging in almost an “arms race” of trying to come up with ways to trick or circumvent the content policies (which are just basically mini-ChatGPTs which ask “does this response as formulated cross into no-no territory”) (see: the story posted recently about kenyan laborers being paid $1/hour to manually label “no-no” content to form the basis for an automated “filter” on the main prompt response system)

https://time.com/6247678/openai-chatgpt-kenya-workers/

permalink parent save report block reply