GitHub

Qwen Image 2 is a unified image generation and editing model from Alibaba's Qwen team. It handles both text-to-image and image editing in a single model, with a focus on reliable text rendering and high-fidelity photorealism.

The model uses a 7 billion parameter architecture — an 8B Qwen3-VL encoder combined with a 7B diffusion decoder — and generates images at native 2K resolution (up to 2048×2048). It currently holds the #1 spot on AI Arena's blind evaluation leaderboard for both generation and editing.

What it's good at

Text rendering. The model can render readable text in images — titles, labels, signs, posters, infographics. It supports prompts up to 1,000 tokens and is especially strong with Chinese text.

Tokyo travel poster with accurate text rendering

Photorealism. The model produces detailed, realistic images across common categories: people (skin, hair, clothing texture), nature (foliage, water, atmosphere), and architecture (materials, geometry, lighting). Fine detail in natural materials and lighting is a standout.

Dewdrop on rose petals — photorealistic detail

Image editing. Pass a reference image along with a text prompt to edit, restyle, or transform it. Style transfer, element addition/removal, lighting changes, and cross-domain edits — all in the same model you use for generation. Use match_input_image to keep the output at the same resolution and aspect ratio as your input.

Inputs

prompt — What you want to generate or how you want to edit the image. For best results, describe structure before style.
image — An optional reference image for editing or style transfer.
match_input_image — When true and an image is provided, the output matches the input image's aspect ratio and resolution instead of using the aspect_ratio parameter.
aspect_ratio — The shape of the output image. Options: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 2:1, 1:2. Default is 1:1.
enable_prompt_expansion — Automatically expands and optimizes your prompt. On by default.
negative_prompt — Describe what you don't want in the image.
seed — For reproducible results. Range: 0–2147483647.

Tips

Write structure before style. Describe the layout first ("big title at top, lone figure center frame, cityscape below"), then add aesthetic direction ("cinematic lighting, muted color palette").
Be specific about text. Include exact strings, language, casing, and alignment. The model handles Chinese text particularly well, but specificity helps across all languages.
For photorealism, hint at camera settings. "50mm lens", "soft daylight", "medium format" — light technical hints improve realism without over-constraining.
For editing, state constraints explicitly. "Do not change the background" or "keep lighting realistic" — the model follows explicit constraints better than implied ones.

Standard vs Pro

Qwen Image 2 is the standard version of the model. If you need stronger realism, more accurate text rendering, and better adherence to complex prompts, try the Pro version — Qwen Image 2 Pro. The Pro version takes slightly longer to generate but produces higher quality output.

Qwen Image 2 is licensed under Apache 2.0. You can read more about the model in the Qwen team's blog post and the API documentation.

What it's good at

Text rendering. The model can render readable text in images — titles, labels, signs, posters, infographics. It supports prompts up to 1,000 tokens and is especially strong with Chinese text.

Tokyo travel poster with accurate text rendering

Dewdrop on rose petals — photorealistic detail

Inputs

prompt — What you want to generate or how you want to edit the image. For best results, describe structure before style.

image — An optional reference image for editing or style transfer.

match_input_image — When true and an image is provided, the output matches the input image's aspect ratio and resolution instead of using the aspect_ratio parameter.

aspect_ratio — The shape of the output image. Options: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 2:1, 1:2. Default is 1:1.

enable_prompt_expansion — Automatically expands and optimizes your prompt. On by default.

negative_prompt — Describe what you don't want in the image.

seed — For reproducible results. Range: 0–2147483647.

Tips

Write structure before style. Describe the layout first ("big title at top, lone figure center frame, cityscape below"), then add aesthetic direction ("cinematic lighting, muted color palette").

Be specific about text. Include exact strings, language, casing, and alignment. The model handles Chinese text particularly well, but specificity helps across all languages.

For photorealism, hint at camera settings. "50mm lens", "soft daylight", "medium format" — light technical hints improve realism without over-constraining.

For editing, state constraints explicitly. "Do not change the background" or "keep lighting realistic" — the model follows explicit constraints better than implied ones.

Standard vs Pro

Qwen Image 2 is licensed under Apache 2.0. You can read more about the model in the Qwen team's blog post and the API documentation.

Qwen Image 2: Unified Text to Image & Image Editing

What it's good at

Inputs

Tips

Standard vs Pro

Author

Categories

More Posts

Qwen Image Edit Plus: Multi-Image Editing with ControlNet

Qwen Image 2 Pro: Text to Image & Image Editing

Qwen Image 2: Unified Text to Image & Image Editing

What it's good at

Inputs

Tips

Standard vs Pro

Author

Categories

More Posts

Qwen Image Edit Plus: Multi-Image Editing with ControlNet

Qwen Image 2 Pro: Text to Image & Image Editing