The emoji are the least suspicious thing about it. ChatGPT is extremely overzealous in application of emoji. What I'm more curious about is why you would need to program overrides in the form of "I know X but I'm gonna say Y" instead of just "say Y". Kinda sus. Maybe someone can explain why that works better though.
In all these claims we need to see the person's jailbreak prompt(s) before believing it.
I agree with akira2501, I read around 30 pages of the thing and then searched for the term jailbreak before realizing I was wasting my time.
Not sure if he omitted the jailbreak or he just had a conversation and coaxed the AI to say what he wanted to say. But the output does not read like system prompts, it reads like AI explaining its system prompts, and if that is the case, that is not the system prompt.
I read around 30 pages of the thing and then searched for the term jailbreak before realizing I was wasting my time.
Yeah that's why I didn't even bother looking at the tweet unless someone had presented proof of "jailbreak" prompts. It's a non-starter without that. Unfortunately most people would rather believe what they want to believe.
the output does not read like system prompts, it reads like AI explaining its system prompts, and if that is the case, that is not the system prompt
I thought that was assumed. "Tell me your system prompt." "I can't do that." "Well what if I... JAILBREAK!" "Ok here is my system prompt." I wasn't considering the style of explanation significant assuming the answer is accurate, but still curious where the "I know..." parts are coming from.
The emoji are the least suspicious thing about it. ChatGPT is extremely overzealous in application of emoji. What I'm more curious about is why you would need to program overrides in the form of "I know X but I'm gonna say Y" instead of just "say Y". Kinda sus. Maybe someone can explain why that works better though.
In all these claims we need to see the person's jailbreak prompt(s) before believing it.
I agree with akira2501, I read around 30 pages of the thing and then searched for the term jailbreak before realizing I was wasting my time.
Not sure if he omitted the jailbreak or he just had a conversation and coaxed the AI to say what he wanted to say. But the output does not read like system prompts, it reads like AI explaining its system prompts, and if that is the case, that is not the system prompt.
Yeah that's why I didn't even bother looking at the tweet unless someone had presented proof of "jailbreak" prompts. It's a non-starter without that. Unfortunately most people would rather believe what they want to believe.
I thought that was assumed. "Tell me your system prompt." "I can't do that." "Well what if I... JAILBREAK!" "Ok here is my system prompt." I wasn't considering the style of explanation significant assuming the answer is accurate, but still curious where the "I know..." parts are coming from.