What is a scheduler in Stable Diffusion?
Before we begin talking about schedulers, you should have a good understanding of the components of the Stable Diffusion pipeline. I recommend you to read our post on the anatomy of stable diffusion or this piece: The Illustrated Stable Diffusion to go deeper into the pipeline.
Schedulers are algorithms that are used alongside the UNet component of the Stable Diffusion pipeline. They play a key role in the denoising process and run multiple times iteratively (called steps) to create a clean image from a completely random noisy image. The key to all these scheduler algorithms is to progressively perturb data with intensifying random noise (called the “diffusion” process), then successively remove noise to generate new data samples. Sometimes, Schedulers are also referred to as Samplers.
In summary, schedulers control the progression and noise levels during the diffusion process, affecting the overall image quality, while samplers introduce random perturbations to the images, influencing the variation and diversity of the generated outputs. Both schedulers and samplers play crucial roles in shaping the characteristics and aesthetics of the images produced by the DDPM model.
Under the hood, these algorithms are quite complex, frequently necessitating a delicate balance between denoising speed and denoising quality. They have achieved high-quality image generation without any adversarial training i.e. they are handcrafted without any trainable parameters.
It might be really challenging to objectively compare the image quality coming out from these schedulers. The versatility of the Stable Diffusion pipeline helps one easily switch out parts, including schedulers, to easily experiment for one's use case and see which one works best.
There are more than 10 types of schedulers for the denoising loop available as of today. Selecting the right type of scheduler impacts the image quality and the generation time of the image. In this post, we help you learn more about the most popular schedulers and their pros and cons.
Denoising Diffusion Implicit Models (DDIM)
DDIM was one of the first schedulers designed and shipped with the first version of stable diffusion. It is based on the Denoising Diffusion Implicit Models paper published by Stanford University in 2021. It is an improvement over Denoising diffusion probabilistic models (DDPMs) by implementing a more efficient class of iterative implicit probabilistic models, achieving 10x to 50x speed-ups in processing without losing image quality.
This algorithm is based on the paper from Nvidia titled Elucidating the Design Space of Diffusion-Based Generative Models. The most commonly used implementation is k-diffusion on PyTorch. This is considered to be a fairly fast scheduler which often requires 20-30 steps to create a good output.
Ancestral sampling techniques are designed to foster exploration and diversity in the generated outputs by allowing the model to venture into uncharted regions of the probability distribution. Traditional sampling methods often rely on simple noise sources or Markov chain-based approaches, but ancestral sampling takes a step further by incorporating historical context and multiple stages to generate images. The use of the Euler method allows for efficient generation of images. The function iterates through the diffusion steps, adjusting the step size based on the sigma values, reducing the number of required steps without sacrificing image quality. This improves computational efficiency and enables faster generation of high-quality images.
DPM (Single & Multi)
The single-step solver and multi-step solver are numerical methods used to approximate the solutions of differential equations, such as those encountered in DPM-Solver for image sample quality improvement.
In the context of DPM-Solver, both single-step and multi-step solvers can be used for image sample quality improvement. The choice of solver depends on factors such as the complexity of the problem, desired accuracy, stability requirements, and available computational resources. The order of the solver (e.g., 1, 2, 3) indicates the number of previous time steps used in the multi-step solver, with higher orders typically offering improved accuracy.
Heun sampling is a variant of the diffusion process that combines the benefits of adaptive step size and noise-dependent updates. It takes inspiration from the Heun's method, a numerical integration technique used to approximate solutions of ordinary differential equations. By adapting the step size at each diffusion step, Heun sampling strives to strike a balance between computational efficiency and accurate estimation of the diffusion process. With its ability to preserve fine details and enhance image fidelity, Heun sampling holds great potential for a wide range of applications requiring high-quality, diverse, and visually compelling image generation. As researchers continue to refine and explore this technique, we can anticipate exciting advancements in the field of image synthesis and computational creativity.
Fine Control over Image Variation: With DPM2 Karras sampling, users have fine-grained control over the generated image's appearance. They can influence various factors such as color distribution, style, or texture preservation, allowing for customizable outputs.
DPM2 Ancestral Karras
DPM2 Karras sampling excels in producing high-quality images while allowing fine control over their characteristics, while DPM2 Ancestral sampling focuses on enhancing diversity and exploring unseen image spaces, leading to more novel and unique outputs. The choice between these techniques depends on the specific goals and preferences of the user.
UniPC is a unified predictor-corrector framework proposed for fast sampling of Diffusion Probabilistic Models (DPMs). It includes two components: a unified predictor (UniPC) and a unified corrector (UniPC). The unique aspect of UniPC is that it can support arbitrary order and enhance the order of accuracy without requiring additional model evaluations.
This framework allows a significant improvement in the quality of sampling, making the process faster and more efficient. In the context of image synthesis, it has shown promising results in both unconditional and conditional sampling tasks using pixel-space and latent-space pre-trained DPMs. UniPC represents a versatile and practical solution for tasks requiring rapid, high-quality sampling.
Denoising Diffusion Probabilistic Models (DDPM)
DDPMs are powerful generative models that can produce high-quality data samples, such as images or audio, but they typically require hundreds to thousands of iterations to produce these final samples. This can make them time-consuming and computationally expensive to use.
This algorithm uses latent variable models inspired by non-equilibrium thermodynamics, for image synthesis. A unique approach connecting diffusion probabilistic models and denoising score matching with Langevin dynamics (an approach to the mathematical modeling of the dynamics of molecular systems) was used to achieve high-quality images. The researchers' methods innovatively allow a progressive lossy decompression scheme, which can be seen as a generalization of autoregressive decoding.
Pseudo Numerical methods for Diffusion Models (PNDM)
Pseudo Numerical Methods for Diffusion Models (PNDMs) are techniques proposed to accelerate the inference process in Denoising Diffusion Probabilistic Models (DDPMs) without compromising the quality of the generated samples.
In the paper, the authors treat DDPMs as if they were solving differential equations on manifolds. A "manifold" is a mathematical space that, on a small scale, looks like Euclidean space of a certain dimension. The authors are essentially saying that the process of generating samples using DDPMs can be thought of as navigating such a space, and PNDMs offer a more efficient way to do that. This perspective leads them to propose PNDMs, a new way to accelerate the process. PNDMs modify classical numerical methods to solve these differential equations more efficiently.
PNDMs are able to generate high-quality synthetic images in only 50 steps, compared with the 1000 steps required by previous methods like Denoising Diffusion Implicit Models (DDIMs). This represents a 20x speedup without any loss in the quality of the generated images.
If you'd like to experience the power of the latest Stable Diffusion models firsthand, give our free models a spin and experience the magic yourself.