
Qwen Image Edit Plus: Multi-Image Editing with ControlNet
Qwen Image Edit Plus is Alibaba's 20B parameter image editing model with improved multi-image editing, person consistency, product poster generation, and native ControlNet support.
Qwen Image Edit Plus is the latest iteration of Alibaba's Qwen-Image editing model. It takes one or more images and a text prompt, then edits them according to your instructions. Unlike models that make broad, unpredictable changes, this one gives you granular control whether you're making subtle tweaks or major transformations.
The model has 20 billion parameters and is built on Qwen-Image-Edit, trained specifically for multi-image editing. It works best with 1 to 3 input images.
Key features
Multi-image editing. Combine multiple images in creative ways. You can merge people with other people, place people in new scenes, or add products to different contexts. The model understands prompts like "person + person," "person + product," and "person + scene."
For example, you can take a photo of two people and merge them into a single scene with a prompt like "Both people standing together in a park." The model maintains the identity of each person while creating a natural-looking composite.

Person consistency. The model preserves facial identity across different portrait styles and pose transformations. This makes it great for creating consistent character images, restoring old photos, or generating memes while keeping the person recognizable.
You can transform someone into various artistic styles — like anime, oil painting, or Studio Ghibli — and the person remains identifiable. Change poses, add text overlays, or restore damaged photographs, all while maintaining who they are.

Product poster generation. Turn plain product photos into professional posters. The model maintains product identity while adding creative backgrounds and compositions. Take a product shot with a white background and transform it into a polished marketing poster with environmental context, lighting effects, and branding elements.
Text editing in both Chinese and English. Edit text directly in images while preserving the original font, size, and style. You can modify content, change fonts, adjust colors, or apply different materials to text. Examples include changing "Summer Sale" to "Winter Sale" on a poster while keeping the exact same font and layout, or editing Chinese characters on signage while maintaining the calligraphic style.
ControlNet support. The model works with common ControlNet conditions like depth maps, edge maps, and keypoint maps. You can use pose keypoints to change someone's body position, depth maps to maintain spatial relationships, or edge maps to preserve structural boundaries while making edits.
How it works
The model uses two types of editing:
Appearance editing: Add, remove, or modify specific elements while keeping everything else pixel-perfect. This is for when you want precise, localized changes.
Semantic editing: Make broader creative transformations like style transfers, pose changes, or IP creation. The model can update pixels across the image as long as it preserves the core meaning and content.
Example use cases
- Creating character combinations — Take separate images of a person's face, a specific outfit, and a desired pose, then combine them into a single cohesive image in whatever style you want.
- Restoring old photographs — Bring damaged or faded photos back to life while maintaining the person's identity and the original character of the image.
- Making memes — Add text to photos while preserving the person's identity and the image's overall composition.
- Editing posters and marketing materials — Modify both text and images in existing posters while maintaining visual consistency and style.
Tips for best results
- For multi-image editing, 1 to 3 input images typically work best. More than that and the model may struggle to maintain consistency across all elements.
- Be specific in your prompts about what should change and what should stay the same. For example, "change the background to a beach but keep the person's clothing exactly as is."
- For text editing, mention if you want to preserve or change the font style. The model can either match existing typography or apply new styles based on your description.
- When working with people, reference specific characteristics you want to maintain like "preserve the facial features" or "keep the same hairstyle and expression."
- Use clear, descriptive language in your prompts. Instead of "make it better," try "enhance the lighting and add warm sunset tones."
- For product posters, describe both the product positioning and the desired background or context. The more specific you are, the better the result.
More information
For technical details and the full research paper, check out the Qwen Image Edit documentation. Qwen Image Edit Plus is available for commercial use.
Author
More Posts

Qwen Image 2: Unified Text to Image & Image Editing
Qwen Image 2 is Alibaba's unified image generation and editing model with 7B parameters, featuring reliable text rendering, high-fidelity photorealism, and native 2K resolution output.

Qwen Image 2 Pro: Text to Image & Image Editing
Qwen Image 2 Pro is Alibaba's high-end unified image generation and editing model with 7B parameters, featuring strong realism, accurate text rendering, and complex prompt adherence.