Prompt Guide for Stable Diffusion XL (SDXL 1.0)

This guide simplifies the text-to-image prompt process, helping you create prompts with SDXL 1.0 that produce the best visual results.

Prompt Guide for Stable Diffusion XL (SDXL 1.0)
Images generated using Stable Diffusion XL (SDXL 1.0)

Diving into the realm of Stable Diffusion XL (SDXL 1.0), one quickly realizes that the key to unlocking its vast potential lies in the art of crafting the perfect prompt. Much like a writer staring at a blank page or a sculptor facing a block of marble, the initial step can often be the most daunting. However, developing a systematic approach to building effective prompts is a challenge every user must embrace.

This article is a culmination of countless hours of experimentation, trials, errors, and invaluable insights gathered from a diverse community of Stable Diffusion users. It's a distillation of collective wisdom, aiming to shed light on the nuances and intricacies of prompt creation. Whether you're a novice just starting your journey or a seasoned user looking for refined strategies, this guide seeks to offer clarity.

Our objective is to consolidate everything there is to know about prompts within the Stable Diffusion ecosystem. Instead of scouring through fragmented sources or learning through hit-and-miss attempts, this article aspires to be your comprehensive go-to resource. By the end, you'll be equipped with a deeper understanding, ready to harness the full power of Stable Diffusion with confidence and creativity.

Writing text-to-image prompts

Imagine teaching a young child to tie their shoelaces. You wouldn't just say, "Tie your shoes." Instead, you'd break it down step by step, ensuring they grasp each part of the process. The devil is in the details. The more specific and clear our instructions, the better the Stable Diffusion can execute them.

Writing text-to-image prompts is this delicate dance of precision and creativity. It's an art form, calling for imagination, intuition, and sometimes, a bit of trial and error. It's about crafting the perfect set of instructions that not only convey what we want but also resonate with the AI's processing mechanisms.

Prompting is a lot more like cooking

Crafting a prompt for AI is akin to the artistry a chef displays in the kitchen. Just as every dish is a symphony of carefully chosen ingredients, meticulously followed recipes, and the nuanced calibration of temperature and timing, crafting an AI prompt follows a similar rhythm. In the AI world, our ingredients are the Prompt Elements—those foundational words and phrases that set the tone. The recipe becomes the Prompt Structure, a roadmap that steers the AI's thought process, ensuring it captures the essence of our vision. And the subtle art of temperature and timing? That translates to Prompt Parameters in AI, allowing us to fine-tune the image output, making sure it's just right.

Anatomy of a Prompt

💡
[1] Subject, [2] Detailed Imagery, [3] Environment Description, [4] Mood/Atmosphere Description, [5] Style, [6] Style Execution

Subject: The Core of Your Vision

At the heart of every image lies its subject, serving as the focal point that captures attention and conveys the primary message. It's the essence you want the AI to center its creativity any of the below elements:

  • Character: Think of living entities, be it a person with a specific persona, an animal in its natural habitat, or any other animate being that can be the star of your image.
  • Object: This encompasses all inanimate items, ranging from everyday objects like a pen or a book to grander concepts like a spaceship or a historic artifact.
  • Scene: It's the broader setting or environment, whether it's a serene beach at sunset, a bustling city square, or a quiet village lane.
  • Action: Dynamic movements or events that bring energy to the image, such as a couple dancing in the rain or a dramatic explosion in a movie scene.
  • Emotion: The underlying feelings that the image evokes, from the euphoria of happiness to the depths of sorrow.
  • Position: The spatial arrangement, indicating where the subject is placed in relation to other elements, like 'hovering above' or 'nestled beside'.

Detailed Imagery: Adding Depth and Nuance

Once you've defined the subject, it's time to delve into the specifics that add layers of depth and richness, a few examples:

  • Clothing: Beyond just garments, it's about patterns, styles, cultural significance, and accessories that define a character.
  • Expression: The subtle (or sometimes not-so-subtle) facial cues that convey a myriad of emotions and reactions.
  • Color: The palette choices that set the mood. Are they vibrant and lively, soft pastel tones, or stark monochrome contrasts?
  • Texture: The tactile quality, whether it's the smoothness of silk, the roughness of bark, or the scaliness of a reptile.
  • Proportions: The relative sizes of elements, ensuring harmony and balance in the image.
  • Perspective: The vantage point, be it a bird's eye view from above or a worm's eye view from below, that dictates how the scene unfolds.
  • Reflection and Shadows: These elements play with light, adding realism, depth, and dimension to the image.
  • Interaction: How different elements relate to and engage with each other, creating a dynamic interplay.

Environment Description: Setting the Stage

The backdrop against which your subject shines, some examples:

  • Indoor/Outdoor: Defines the primary setting, be it a cozy room, a sprawling garden, or the vastness of outer space.
  • Landscape: The broader geographical context, from towering mountains and deep valleys to the urban jungle of skyscrapers.
  • Weather: Elements like sunshine, rain, or snow that can dramatically alter the mood of the image.
  • Time of Day: The difference between a golden sunrise, the starkness of midday, or the soft hues of twilight can be profound.
  • Background and Foreground: These layers add depth, helping to focus on the subject while also providing context.
  • Terrain: The type of ground or surface, be it rocky terrains, sandy beaches, or watery expanses.
  • Architecture: Man-made structures that can add historical, cultural, or futuristic contexts.
  • Natural Elements: The touch of nature, from towering trees and flowing rivers to fluffy clouds in the sky.

Mood/ Atmosphere: The Soul of the Image

The intangible elements that evoke feelings, a few examples:

  • Emotion: The dominant feeling, whether it's the joy of a festival or the melancholy of a rainy day.
  • Energy: The intensity, ranging from the calm stillness of a pond to the chaotic energy of a marketplace.
  • Tension and Serenity: Elements that either add suspense and anticipation or bring a sense of peace and tranquility.
  • Warmth/Coldness, Brightness/Darkness: These elements play with temperature and light to set the overall tone.

Artistic Style: The Aesthetic Choice

Your preferred visual genre, a few examples:

  • Anime to Photographic: Whether you're looking for the exaggerated features of Japanese animation, the stark realism of a photograph, or anything in between, the style sets the visual language of the image. Some more styles: Comic Book, Fantasy Art, Low Poly, Pixel Art, Watercolor, Line Art etc.

Style Execution: Bringing the Vision to Life

The tools and techniques to realize the chosen style, a few examples:

  • Illustration Technique: The method, be it hand-drawn sketches, digital designs, or mixed-media blends.
  • Rendering Engine: The software powerhouse that turns prompts into visuals. e.g. Blender etc.
  • Camera Model/Settings: For those aiming for a photographic touch, these settings can make all the difference.
  • Materials: From the brushes and paints of a traditional artist to the digital tablets of modern creators.
  • Resolution, Lighting, and Color Types: The final touches that determine the clarity, illumination, and color palette of the image.
Note: You can combine two or more examples in each of the above elements to generate an image that closely matches your vision.
💡
Example of Prompt Structure

[1] Subject: A bustling futuristic city filled with towering skyscrapers.(Scene)
[2] Detailed Imagery: The skyscrapers have sleek, metallic surfaces and neon accents. (Color + texture)
[3] Environment Description: Cars zoom between the buildings. (Foreground)
[4] Mood/Atmosphere Description: The atmosphere is electric and full of innovation and excitement. (Energy)
[5] Style: Created in Neon Punk style. (Fantasy art)
[6] Style Execution: Utilizing vibrant neon colors and sharpcontrasts to highlight the futuristic theme. (Color types + LightingStyle)
Image generated using Stable Diffusion XL (SDXL 1.0)

Fine-Tuning Your Image Outputs: A Deep Dive into Prompt Parameters

Here's our somewhat technical guide, which you can peruse to understand the optimal settings for prompt parameters. Read here

Creating with AI is not just about telling the model what you want; it's also about guiding its process and refining its outputs. This is where prompt parameters come into play, acting as the dials and switches that fine-tune the AI's performance. Let's delve deeper into these crucial components:

Negative Prompt:

Think of the negative prompt as a protective fence, ensuring that certain elements stay out of your creative garden. It's a way to explicitly tell the AI what you don't want. For instance, if you're envisioning a serene nature scene and don't want any modern elements, a negative prompt like "no buildings or vehicles" ensures the AI steers clear of introducing skyscrapers or cars into your tranquil landscape.

Scheduler:

The scheduler is akin to a conductor guiding an orchestra, ensuring each instrument (or in this case, part of the model) plays its part at the right time and intensity. Different schedulers can influence the quality and style of the generated image. It's a behind-the-scenes maestro that can make the difference between a good output and a great one. By selecting the right scheduler, you're optimizing the AI's internal operations to best match your desired outcome.

Steps:

Imagine sculpting a piece of art. Each chisel mark, each refinement brings you closer to your envisioned masterpiece. In the AI world, 'steps' represent these iterative refinements. The more steps you allow the model to take, the more it refines and polishes the output. However, like any intricate process, more steps mean more time and computational power. It's a balance between precision and efficiency.

Guidance Scale:

If you've ever used a GPS, you know the importance of clear directions. The guidance scale is like setting the strictness of your GPS. A higher value ensures the AI sticks closely to your provided prompt, following it to the letter. A lower value gives the AI a bit more freedom to interpret and add its own flair. It's about deciding how tightly you want to hold the reins.

Seed:

In the unpredictable world of AI, the seed is your anchor of consistency. It's like saving a specific set of preferences or settings in a video game. By using the same seed, you ensure that the model, when given the same inputs and parameters, will produce the same output every time. This is invaluable when you want to reproduce a particular result or share your process with others.

Note: Uncheck the randomized seed option to lock your seed value, so that the same image is generated for the prompt and attributes you input.

Styles:

With Stable Diffusion XL, you have a rich palette of over 90 styles to choose from, allowing you to dictate the visual language of your output. Whether you're aiming for the sharp realism of photography, the playful and exaggerated features of a cartoon, the defined and minimalist strokes of line art, or the geometric simplicity of low poly, the choice is yours. Each style brings its own flavor, transforming the same prompt into vastly different visual experiences. It's like choosing between oil paints, watercolors, charcoal, or pastels for a piece of art. Your chosen style can dramatically influence the mood, tone, and impact of the final image.

In essence, prompt parameters are your toolkit for precision. They assist you to shape, refine, and perfect the AI's outputs, ensuring the final image aligns seamlessly with your vision.

Some Examples

💡
Prompt: "Model in layered street style, standing against a vibrant graffiti wall, Vivid colors, Mirrorless, 28mm lens, f/2.5 aperture, ISO 400, natural daylight"
Style: Photographic
Negative Prompt: out of frame, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature.
Steps:27
Guidance Scale: 7
Strength: 1
Seed: 68420

💡
Prompt: "Model in trendy streetwear, City street with neon signs and pedestrians, Cinematic, Close up shot, Mirrorless, 35mm lens, f/1.8 aperture, ISO 400, slight color grading"
Style: Photographic
Negative Prompt: out of frame, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature.
Steps:27
Guidance Scale: 9
Strength: 1
Seed: 56893
💡
Prompt: "Model in modern attire with metallic accessories, in an old factory setting, Metallic sheen, Full-frame mirrorless, 35mm lens, f/2.8 aperture, ISO 500, off-camera flash"
Style: Photographic
Negative Prompt: out of frame, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature.
Steps:27
Guidance Scale: 7
Strength: 1
Seed: 68436

Conclusion:

This comprehensive guide serves as a roadmap for readers, whether you're a novice just dipping your toes into the AI imagery waters or a seasoned pro looking to refine your skills. We've dissected the anatomy of prompt structure and its elements,  fine-tuned our understanding with prompt parameters, ensuring precision and consistency in our AI outputs. By understanding these components, you're equipped to communicate more effectively with AI models like Stable Diffusion XL, guiding them to produce visuals that align seamlessly with your creative vision.