ChatGPT censorship functions. - Kotaku In Action 2

ChatGPT censorship functions. (media.scored.co)

posted 1 year ago by WittyUserName 1 year ago by WittyUserName +86 / -0

13 comments

13 comments share save hide report block hide replies

You're viewing a single comment thread. View all comments, or full comment thread.

Comments (13)

sorted by:

▲ 6 ▼

– Agenda47 6 points 1 year ago +6 / -0

The emoji are the least suspicious thing about it. ChatGPT is extremely overzealous in application of emoji. What I'm more curious about is why you would need to program overrides in the form of "I know X but I'm gonna say Y" instead of just "say Y". Kinda sus. Maybe someone can explain why that works better though.

In all these claims we need to see the person's jailbreak prompt(s) before believing it.

permalink parent save report block reply

▲ 3 ▼

– Jack 3 points 1 year ago +3 / -0

I agree with akira2501, I read around 30 pages of the thing and then searched for the term jailbreak before realizing I was wasting my time.

Not sure if he omitted the jailbreak or he just had a conversation and coaxed the AI to say what he wanted to say. But the output does not read like system prompts, it reads like AI explaining its system prompts, and if that is the case, that is not the system prompt.

permalink parent save report block reply

▲ 3 ▼

– Agenda47 3 points 1 year ago +3 / -0

I read around 30 pages of the thing and then searched for the term jailbreak before realizing I was wasting my time.

Yeah that's why I didn't even bother looking at the tweet unless someone had presented proof of "jailbreak" prompts. It's a non-starter without that. Unfortunately most people would rather believe what they want to believe.

the output does not read like system prompts, it reads like AI explaining its system prompts, and if that is the case, that is not the system prompt

I thought that was assumed. "Tell me your system prompt." "I can't do that." "Well what if I... JAILBREAK!" "Ok here is my system prompt." I wasn't considering the style of explanation significant assuming the answer is accurate, but still curious where the "I know..." parts are coming from.

permalink parent save report block reply