ChatGPT’s new Images 2.0 model is surprisingly good at generating text

Tecnología21.Apr.2026 19:003 min read

OpenAI’s new ChatGPT Images 2.0 model demonstrates major improvements in rendering readable text within AI-generated images. The model can create detailed visuals, marketing assets, and even multi-paneled comics with stronger instruction-following and multilingual text support.

ChatGPT’s new Images 2.0 model is surprisingly good at generating text

It used to be easy to distinguish between human-made and AI-generated imagery. Just two years ago, you couldn’t use image models to create a menu for a Mexican restaurant without inventing new culinary delights like “enchuita,” “churiros,” “burrto,” and “margartas.”

Now, when asked for a menu of Mexican food, OpenAI’s new ChatGPT Images 2.0 model produces something that could immediately be used in a restaurant without customers noticing anything unusual. (Though ceviche priced at $13.50 might raise questions about the fish.)

ChatGPT Images 2.0 generated Mexican restaurant menu

Image Credits: ChatGPT Images 2.0

For comparison, here’s the result generated by DALL-E 3 two years ago, when ChatGPT did not yet support image generation:

DALL-E 3 generated Mexican restaurant menu with spelling errors

Image Credits: Microsoft Designer (DALL-E 3)

Why AI Image Models Struggled With Text

AI image generators have historically struggled to spell because they generally relied on diffusion models, which reconstruct images from noise.

“The diffusion models […] are reconstructing a given input,” Asmelash Teka Hadgu, founder and CEO of Lesan AI, told TechCrunch in 2024. “We can assume writings on an image are a very, very tiny part, so the image generator learns the patterns that cover more of these pixels.”

Researchers have since explored other mechanisms for image generation, such as autoregressive models, which predict what an image should look like and function more like large language models (LLMs).

OpenAI declined to specify during a press briefing what kind of model powers ChatGPT Images 2.0.

New Capabilities in Images 2.0

OpenAI says the new model has “thinking capabilities,” enabling it to search the web, generate multiple images from a single prompt, and double-check its outputs. This allows Images 2.0 to create marketing assets in various sizes and even multi-paneled comic strips.

The company also says the model has a stronger understanding of non-Latin text rendering in languages such as Japanese, Korean, Hindi, and Bengali. Its knowledge cutoff is December 2025, which may affect how accurately it responds to prompts involving recent events.

“Images 2.0 brings an unprecedented level of specificity and fidelity to image creation. It can not only conceptualize more sophisticated images, but it actually brings that vision to life effectively, able to follow instructions, preserve requested details, and render the fine-grained elements that often break image models: small text, iconography, UI elements, dense compositions, and subtle stylistic constraints, all at up to 2K resolution,” OpenAI said in a press release.

Although generating images isn’t as instantaneous as typing a question into ChatGPT, creating something complex like a multi-paneled comic strip takes only a few minutes.

All ChatGPT and Codex users will have access to Images 2.0 starting Tuesday, with paid users able to generate more advanced outputs. OpenAI will also make the gpt-image-2 API available, with pricing based on the quality and resolution of outputs.