Guides

A Comprehensive Guide to Stable Diffusion Parameters for Image Generation

This guide gives a quick overview of parameters influencing Stable Diffusion image generation, building on our previous discussions about individual parameters. It aims to provide a holistic understanding of these parameters.

Rohit Rao, Shanmukha Karthik

20 Mar 2024 • 6 min read

In our previous blog posts, we explored individual parameters for image generation, their purposes, and how they can be adjusted to enhance output quality. This guide, however, aims to provide a comprehensive explanation of all the parameters that influence the image generation process. By understanding the impact of these parameters, you can fine-tune the output to better align with your preferences and desired results. Let’s get started.

Negative Prompt

Negative prompt act like filters, helping you get exactly what you want from the generated images. They allow you to specify things which you don't want to see in the final result. They help in removing blurry and distorted images, simply put it gives you more control by removing obstacles thus enabling you to focus on creating the image you have in your mind.

Few general negative prompts are

out of frame, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature

Let us generate a few images based on certain situations with and without negative prompts to show its effects on the generated images.

Few negative prompts for landscapes:

blurry , close up, distorted , foggy , grains , low quality , low contrast , surreal , underexposed.

For our initial image generation, we aim to create a minimalist landscape photograph. Our objective is to ensure that the resulting image gives a sense of simplicity, that is devoid of dullness and barrenness.

Prompt: High-resolution photograph of a landscape against a minimalist background, focusing on detail and texture.

Image generated without and with negative prompts

Few negative prompts for people portraits

deformed, ugly, mutilated, disfigured, text, extra limbs, face cut, head cut, extra fingers, extra arms, poorly drawn face, cropped, cut off, missing parts

Let us now try to generate an image of a beautiful woman with white hair modelling for an editorial magazine having minimal jewellery.

Prompt : A beautiful woman with white hair and minimal jewellery, capturing attention with her attitude, has modeled for an editorial magazine,

The impact of negative prompts is evident in the generated images. The second image, created using a negative prompt, aligns well with the desired outcome and adheres to the prompt specifications. In contrast, the first image, generated without a negative prompt, fails to accurately reflect the intended prompt, resulting in an undesirable output. This not only wastes time but also leads to aimless experimentation without a clear direction. Consequently, negative prompts play a crucial role in refining the final generated image to match our specific needs and vision, ensuring a more efficient and focused creative process.

Seed

The seed parameter is a numerical value that initializes the image generation process. If left unspecified, it is randomly assigned. However, by controlling the seed parameter, you gain the ability to reproduce the same image consistently when using the same set of parameters and prompts. This feature is particularly valuable for experimentation and iteration.

It can be used to control specific features of the subject present in the image enabling easier experimentation process. Let us try to generate an image using the following prompt below

Prompt: Candid Photo journalistic Shot of a beautiful woman feeling <respective emotion to be filled here> , stunning outfit , Colorful Gardens, Nature Celebration, Shot on Medium Format Film school

Minor changes in the subject by changing the prompt

The above images illustrate how the seed parameter can influence the composition and the surrounding elements within the generated image. Although subtle variations are introduced in each output, maintaining a fixed seed parameter enables us to streamline the experimentation process once we are satisfied with the type, quality, and nature of the image being generated. By keeping the seed constant, we can iterate and fine-tune other parameters with a consistent baseline, enhancing our ability to refine the output.

Sampler

Samplers are specialized algorithms that play a pivotal role in the de-noising process within the Stable Diffusion pipeline. These algorithms iteratively apply de-noising steps to the input data, progressively introducing and removing random noise during each cycle. This iterative method gradually enhances image quality, resulting in visibly clearer, refined, and cleaner outputs. By systematically de-noising the initial input, samplers contribute to the generation of superior-quality images through the Stable Diffusion process.

While depending upon your requirements the choice of sampler can be selected, UniPC and Euler is preferred.

Prompt : vibrant illustration of a girl holding a balloon, vibrant colors, detailed, sunny day, attention to detail, 8k

Images generated by samplers Heun , UniPC, Euler , LMS respectively

Guidance Scale

The Guidance Scale, or Classifier-Free Guidance (CFG) scale, influences the degree to which Stable Diffusion adheres to the provided text prompt during image generation.

A higher value on the Guidance Scale indicates stricter adherence to the input text. However, it also limits creative liberty, potentially yielding less diverse images. Conversely, a lower Guidance Scale value grants the AI greater creative freedom to interpret the text prompt, fostering more diverse and unexpected outcomes, which may be desirable in certain contexts.

A CFG value in the range of 7-9 is recommended, scaling the value can introduce further tiny details. Values beyond the range of 15 isn't highly recommended until the prompt isn't well defined and might also affect the coherence of the image.

For example, let us try to generate images of a cute little puppy armed with a gun and experiment with various CFG values to understand its impact on the image generation process.

Prompt:A Pomeranian puppy in a soldier costume and armed with gun , shot on hasselblad , muted colors

Images generated using guidance scale values: 3 , 5 , 9 , 12 , 13

Strength

The Strength Parameter is pivotal in dictating the extent of noise introduced during the image generation process, thereby influencing the level of randomness. A higher setting on the strength Parameter introduces greater levels of noise, resulting in increased random variations and reduced consistency with the original image.

Strength values below the range of 0.4 tend to preserve more of the original image’s features and mostly resemble the input image, while above the range of 0.6 introduce more randomness to the picture making it more creative. Optimal values for the strength parameter in the range of 0.4 to 0.6

Let us try to convert the below image of a little dog eating a roll into a cute little bear eating a burger.

Prompt : A cute baby bear eating a big burger at home. The expression has to be cute and happy.

Images generated for strength values 0.2 , 0.5 , 0.8 respectively

Thus we can see the impact of the strength parameter and how the the image generated using a high value adheres to the specified prompt by not only altering the subject of the input image but also making subtle changes to the surroundings.

Conclusion

This guide serves as a roadmap for readers, whether you are a novice or a seasoned pro in generating images. We've dissected the parameters present and how they can be altered. By understanding these components, you're equipped to communicate more effectively with models guiding them to produce visuals that align seamlessly with your vision.