That sounds a bit too much. Generating an sdxl image and then scaling it up is the common procedure, but that should not take 2 minutes on a 40xx card. For reference I can generate 3 batches of 5 images (without the upscaling step) in less than 2 minutes on my 4070ti. And that’s without using faster sdxl models like lightning or turbo or whatever.
Gpt4 uses dall-e under the hood, which is not that great with text.