Because as you guessed it it's not really an AI... or perhaps this is all AI is. What happens is the chatbots are given a set of basic instructions in the form of "here's what you're able to do for the user... you can search the web for related pictures, you can generate images with GPT-image-1, you can do the following edits to user uploaded pictures...", then they have something like an internal API that links to the set of available tools and has instructions on how to call each one, most of them being commands to the latest image generation model. Unless the toolset includes a "literally just copy/paste the picture" tool, it's not going to be able to generate that response. In this case GPT-image-1 interpreted the prompt as the user requesting some kind of inpainting so that's what it did, I guess with hallucination at whatever the default setting is so it pulled style guidance from its biased model.
Because as you guessed it it's not really an AI... or perhaps this is all AI is. What happens is the chatbots are given a set of basic instructions in the form of "here's what you're able to do for the user... you can search the web for related pictures, you can generate images with GPT-image-1, you can do the following edits to user uploaded pictures...", then they have something like an internal API that links to the set of available tools and has instructions on how to call each one, most of them being commands to the latest image generation model. Unless the toolset includes a "literally just copy/paste the picture" tool, it's not going to be able to generate that response. In this case GPT-image-1 interpreted the prompt as the user requesting some kind of inpainting so that's what it did, I guess with hallucination at whatever the default setting is so it pulled style guidance from its biased model.