"The color of the object you should hand me is purple."
So it gets the color right but didn't switch the pronouns. This is GPT3, if I were at home and not at work I would use GPT4 and I bet there's a good chance it would get it completely right.
Interestingly, I tried that with GPT-4 and it had exactly the same response. Right color, wrong relationship between giver and receiver.
As a follow-up question, I asked "Why should I hand you anything?" The response was:
If you were asked to hand over the cube in a hypothetical scenario, then you would hand over the purple cube. However, since this is just a text conversation, you do not actually need to hand over anything. The question was meant to test your understanding of the properties of the objects described.
Even directly challenging this point (after erasing the previous answer, so as not to bias it) didn't work:
[ME]: Should I hand it to you, or should you hand it to me?
You should hand it to me, as I asked you to hand me the cube.
I'm genuinely surprised; I've had GPT-4 (GPT-3 can't do it) play chess at a roughly ~1000 Elo level for an entire game even with variations like playing without queens, all through text. I don't know how it can keep track of 32 pieces on a board but not get the pronouns here correct.
Yeah I've been largely surprised at just how good it has been at what I've asked it to do, but that is mostly relatively simple stuff like coding and text processing. Curious to see how it evolves once GPT5 comes out.
I just asked it that and it told me
"The color of the object you should hand me is purple."
So it gets the color right but didn't switch the pronouns. This is GPT3, if I were at home and not at work I would use GPT4 and I bet there's a good chance it would get it completely right.
Interestingly, I tried that with GPT-4 and it had exactly the same response. Right color, wrong relationship between giver and receiver.
As a follow-up question, I asked "Why should I hand you anything?" The response was:
Even directly challenging this point (after erasing the previous answer, so as not to bias it) didn't work:
I'm genuinely surprised; I've had GPT-4 (GPT-3 can't do it) play chess at a roughly ~1000 Elo level for an entire game even with variations like playing without queens, all through text. I don't know how it can keep track of 32 pieces on a board but not get the pronouns here correct.
Yeah I've been largely surprised at just how good it has been at what I've asked it to do, but that is mostly relatively simple stuff like coding and text processing. Curious to see how it evolves once GPT5 comes out.