Qwen-Image: Prompt & Parameter Guide
Get started with Qwen-image, the most powerful image generation model out there.

Qwen-Image is the latest image model from Alibaba that is competing with the likes of GPT-Image model. What sets it apart is that this model is Open source and has an Apache 2.0 license making it the go to choice for companies and startups for image use cases. Segmind will have the fine-tuning for Qwen-image soon, opening the door to customize the model for your use case.
We wrote a practical guide to leverage this model effectively and get the best results every time you use it. We will keep this simple, so you don't need to go over the technical paper to understand how to use the model.
Prompt Basics
The model performs best when your prompt is:
- Simple and Clear: State the subject, style, and mood in simple plain language.
- Detailed but not overloaded: 1 to 3 sentences work well.
- Order matters: Describe the main subject first, then environment, then finer details.
- Text rendering: Put the exact words you want in double quotes.
Example:"Grand Opening"
in glowing gold letters on a neon billboard.
The final prompt template looks like this:
[Main subject], [visual style/medium], [environment & background details], [lighting], [extra effects], ["exact text if any"]
For Example:
A futuristic sports car, photorealistic style, parked under neon city lights, reflections on wet streets, cinematic lighting, "Night Racer" in metallic chrome text on the hood
Key Parameters
These parameters control the processing required and the output quality of Qwen-Image model:
Steps
Keep it around 20-30 for quick renders and 50 for final renders. The processing time and costs are directly proportional to the number of steps.



L to R: 50 steps, 30 steps and 20 steps
Guidance scale
Guidance or cfg_scale is the parameter that dictates how closely the image generation follows your prompt. Lower values allow more creativity where as larger values follow the prompt strictly. Ideal number is between 4 to 5. We created three images with guidance values 2.5, 4.5 and 10 so you can see the effect yourself.
Prompt Used: A mystical dragon hovering above a sparkling waterfall under a starry sky.



L to R; Guidance 10, 4.5 and 2.5
Seed
Use seed to control the output. Same seed + same prompt will produce identical output. This is helpful white iterating over the image with different parameters.
Prompt Tips for Best Results
- For text in images: Use short, clear phrases in quotes, specify font style and color if important.
- For people: Add details like ethnicity, age, clothing, and facial expression.
- For complex scenes: Break into main subject + background + secondary objects.
Samples
Text rendering
We know that a picture is worth a thousand words, but generating a few words or sentences in the right place can elevate the meaning of an image. Qwen-image's ability to blend precise letters and alphabets with imagery make it very powerful.
Detailed long text example 1
A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection.
The text reads:
(left)
"Transfer between Modalities:
Suppose we directly model
p(text, pixels, sound) [equation]
with one big autoregressive transformer.
Pros:
- image generation augmented with vast world knowledge
- next-level text rendering
- native in-context learning
- unified post-training stack
Cons:
- varying bit-rate across modalities
- compute not adaptive"
(Right)
"Fixes:
- model compressed representations
- compose autoregressive prior with a powerful decoder"
On the bottom right of the board, she draws a diagram:
"tokens -> [transformer] -> [diffusion] -> pixels"

Detailed long text example 2
Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.
Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)\n Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.
Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.
Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

Invitation card
photo of a delightful wedding invitation on a tasteful wooden desk. The card is hefty, with eggshell textures, and beautiful embossings, with elegant decorations abstractly representing the couple tastefully integrated into the designs. Iconography is used, but sparingly and in a minimalist way. perfect typesetting.
"You are cordially invited
to the long-awaited union of
Image
and
Text
After years of flirting and collaboration
they are finally becoming One.
Together at last, in Qwen-Image,
they now speak the same language —
where a whisper becomes a masterpiece,
and a prompt becomes a picture.
Please join us in celebrating
this magical multimodal matrimony
where imagination knows no bounds.
Date: March 25, 2025
Location: segmind.com
Dress Code: Pixels or Prose
With love,
Segmind"
perfect typesetting.
