
Qwen Image 2: Unified Text to Image & Image Editing
Qwen Image 2 is Alibaba's unified image generation and editing model with 7B parameters, featuring reliable text rendering, high-fidelity photorealism, and native 2K resolution output.
Qwen Image 2 is a unified image generation and editing model from Alibaba's Qwen team. It handles both text-to-image and image editing in a single model, with a focus on reliable text rendering and high-fidelity photorealism.
The model uses a 7 billion parameter architecture — an 8B Qwen3-VL encoder combined with a 7B diffusion decoder — and generates images at native 2K resolution (up to 2048×2048). It currently holds the #1 spot on AI Arena's blind evaluation leaderboard for both generation and editing.
What it's good at
Text rendering. The model can render readable text in images — titles, labels, signs, posters, infographics. It supports prompts up to 1,000 tokens and is especially strong with Chinese text.

Photorealism. The model produces detailed, realistic images across common categories: people (skin, hair, clothing texture), nature (foliage, water, atmosphere), and architecture (materials, geometry, lighting). Fine detail in natural materials and lighting is a standout.

Image editing. Pass a reference image along with a text prompt to edit, restyle, or transform it. Style transfer, element addition/removal, lighting changes, and cross-domain edits — all in the same model you use for generation. Use match_input_image to keep the output at the same resolution and aspect ratio as your input.
Inputs
- prompt — What you want to generate or how you want to edit the image. For best results, describe structure before style.
- image — An optional reference image for editing or style transfer.
- match_input_image — When true and an image is provided, the output matches the input image's aspect ratio and resolution instead of using the aspect_ratio parameter.
- aspect_ratio — The shape of the output image. Options:
1:1,16:9,9:16,4:3,3:4,3:2,2:3,2:1,1:2. Default is1:1. - enable_prompt_expansion — Automatically expands and optimizes your prompt. On by default.
- negative_prompt — Describe what you don't want in the image.
- seed — For reproducible results. Range: 0–2147483647.
Tips
- Write structure before style. Describe the layout first ("big title at top, lone figure center frame, cityscape below"), then add aesthetic direction ("cinematic lighting, muted color palette").
- Be specific about text. Include exact strings, language, casing, and alignment. The model handles Chinese text particularly well, but specificity helps across all languages.
- For photorealism, hint at camera settings. "50mm lens", "soft daylight", "medium format" — light technical hints improve realism without over-constraining.
- For editing, state constraints explicitly. "Do not change the background" or "keep lighting realistic" — the model follows explicit constraints better than implied ones.
Standard vs Pro
Qwen Image 2 is the standard version of the model. If you need stronger realism, more accurate text rendering, and better adherence to complex prompts, try the Pro version — Qwen Image 2 Pro. The Pro version takes slightly longer to generate but produces higher quality output.
Qwen Image 2 is licensed under Apache 2.0. You can read more about the model in the Qwen team's blog post and the API documentation.
More Posts

Qwen Image Edit Plus: Multi-Image Editing with ControlNet
Qwen Image Edit Plus is Alibaba's 20B parameter image editing model with improved multi-image editing, person consistency, product poster generation, and native ControlNet support.

Qwen Image 2 Pro: Text to Image & Image Editing
Qwen Image 2 Pro is Alibaba's high-end unified image generation and editing model with 7B parameters, featuring strong realism, accurate text rendering, and complex prompt adherence.