How To

Stable Diffusion XL 1.0 API Guide

A guide for developers and hobbyists for accessing the text-to-image generation model SDXL 1.0 API.

Rohit Rao

13 Oct 2023 • 8 min read

Overview: A guide for developers and hobbyists for accessing the text-to-image generation model SDXL 1.0

Stable Diffusion XL 1.0, created by Stability AI, represents a revolutionary advancement in the field of image generation, which leverages the latent diffusion model for text-to-image generation.

With regards to its technical structure, SDXL utilizes an expansive UNet backbone, incorporating a greater number of attention blocks and an extended cross-attention context, facilitated by its second text encoder. SDXL implements a pipeline based on a mixture of experts for latent diffusion. Initially, it employs the base model to produce initial noisy latents, subsequently refining them during the final denoising stages.

Let's dive into a detailed guide showcasing the utilization of the Text-to-Image SDXL API provided by Segmind, empowered by the cutting-edge Stable Diffusion SDXL 1.0. This guide is meticulously crafted to assist developers and hobbyists in seamlessly incorporating this advanced technology into their applications

Two ways to explore the API’s:

Code: Embark on an exciting coding adventure to explore the API's full potential
Playground: Enjoy a hassle-free journey without the need for any code, making exploration a breeze.

Getting started:

1. Generating Authentication keys

Sign up and log in at https://www.segmind.com/
Head over to the console and opt for 'Create New API Key.’

Upon confirmation, the API authentication keys will be issued. Remember to store the key in a secure location before advancing to the subsequent step.

2. Generating the Images:

To initiate image creation using the Text-to-Image SDXL 1.0 API from Segmind, follow this step-by-step workflow:

POST Generate API: Utilize this API to submit a request for image generation.

Commence the process by initiating a request for image generation, as demonstrated below

POST Generate API request:

In Bash:

curl -X POST 
     -H "x-api-key: YOUR API-KEY" \
     -H "Content-Type: application/json" \
     -d '{"negative_prompt":"None","style":"base","samples":"1","scheduler":"UniPC","num_inference_steps":25,"guidance_scale":7.5,"strength":0.2,"high_noise_fraction":0.8,"seed":-1,"img_width":1024,"img_height":1024,"refiner":true,"base64":false}' \
     "https://api.segmind.com/v1/sdxl1.0-txt2img"

In Python:

import requests


api_key = "YOUR-API-KEY"
url = "https://api.segmind.com/v1/sdxal1.0-txt2img"

# Request payload
data = {
  "prompt": "cinematic film still, 4k, realistic, ((cinematic photo:1.3)) of panda wearing a blue spacesuit, sitting in a bar, Fujifilm XT3, long shot, ((low light:1.4)), ((looking straight at the camera:1.3)), upper body shot, somber, shallow depth of field, vignette, highly detailed, high budget Hollywood movie, bokeh, cinemascope, moody, epic, gorgeous, film grain, grainy",
  "negative_prompt": "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft",
  "style": "base",
  "samples": 1,
  "scheduler": "UniPC",
  "num_inference_steps": 25,
  "guidance_scale": 8,
  "strength": 0.2,
  "seed": 468685,
  "img_width": 1024,
  "img_height": 1024,
  "refiner": True,
  "high_noise_fraction": 0.8,
  "base64": False
}

response = requests.post(url, json=data, headers={'x-api-key': api_key})
print(response)

Post-GENERATE request example

Upon executing this code, you will tend to receive a status code

<Response [200]>

The different HTTP status codes that may be encountered include:

200 - OK Image: Generated
401 - Unauthorized: User authentication failed
404 - Not Found: The requested URL does not exist
405 - Method Not Allowed: The requested HTTP method is not allowed
406 - Not Acceptable: Not enough credits
500 - Server Error: The server had some issues with processing

These are the attributes that can be adjusted to enhance the output and achieve superior results.

prompt: Prompt to render
Serves as the instructions given to the model for generating the desired visual content.

negative_prompt: Prompts to exclude, eg. 'bad anatomy, bad hands, missing fingers'.
Refer to specific instructions to exclude certain elements from the prompts provided.
Guides the generation towards more accurate and well-represented content.

style: Styles for Stable Diffusion
Encompass a variety of artistic and design elements used to maintain consistency and coherence throughout the diffusion process
Styles play a crucial role in ensuring the generated image for a particular prompt remains in line with the desired aesthetic.

samples: Number of samples to generate.
Refers to the quantity of images the model would produce.

scheduler: Type of scheduler.
They play a key role in Denoising processes and run multiple times to create a clean image from a completely random image.

num_inference_steps: Number of denoising steps.
Refers to the specific quantity of iterations or processes applied in a denoising technique, aimed at reducing noise.
Affects the depth and quality of images.

guidance_scale: Adjust the level of adherence you desire from the model to align with your prompts.
Allows you to tailor the output according to your preferred degree of compliance.
Ensures that the generated content meets the specific criteria.

strength: How much to transform the reference image
Determines the degree or extent of alterations or modifications applied to a given reference image.

high_noise_fraction: Number of inference steps to be run on each expert
Influences the complexity of the final image being produced.

seed: Seed for image generation.
A numerical value which is used to initialize the process of generating images.
Helps reproducing same image output each time.

img_width: Can only be 1024 for SDXL

img_height: Can only be 1024 for SDXL

refiner:
Default: True
Utilizing this approach enhances the overall quality and fidelity of the output.
Note: Does not work when the high noise fraction is 1.

base64: Base64 encoding of the output image.

Delve into a selection of 40 plus captivating styles, providing the means to guide your model in realizing your desired artistic expression

Note: The 'styles' attribute primarily influences simple prompts, while its impact on complex prompts is relatively minimal.

Stable Diffusion XL 1.0 Playground:

If you prefer not to engage in coding, but seek the ability to swiftly obtain images, welcome to our image playground, our nifty Segmind SDXL 1.0 Playground offers a code-free ride on the image-generating rollercoaster. No code is required, just hop on and enjoy the magic show!

The SDXL 1.0 Playground is crafted for effortless experimentation, presenting itself in the following manner:

Welcome to our showcase of images brought to life by the Segmind SDXL 1.0 API. Peruse and draw from this gallery to craft your unique visual creations.

Prompt: {"prompt": "A high-resolution image capturing a moment of people lifting someone up, showcasing the collective effort and the power of community. Emphasize the emotions, the strength of the group, and the gratitude of the individual being lifted, cinematic",
  "negative_prompt": "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, the draft",
  "style": "fantasy art",
  "samples": 1,
  "scheduler": "UniPC",
  "num_inference_steps": 25,
  "guidance_scale": 8,
  "strength": 0.2,
  "seed": 468685,
  "img_width": 1024,
  "img_height": 1024,
  "refiner": True,
  "high_noise_fraction": 0.8,
  "base64": False
}}
``

Prompt: {"prompt": "male model walking through, exploring, Mumbai, shot on camera such as Nikon Z7, with a 50mm f/1.8 lens to ensure sharpness and clarity. utilize dim and eerie lightning to create a chilling atmosphere . render the final image to 4k resolution to provide crispness , realistic skin , natural features , cinematic scene, ultrarealistic",
  "negative_prompt": " poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark”,
  "style": "base",
  "samples": 1,
  "scheduler": "UniPC",
  "num_inference_steps": 35,
  "guidance_scale": 8,
  "strength": 0.2,
  "seed": 5923789677,
  "img_width": 1024,
  "img_height": 1024,
  "refiner": True,
  "high_noise_fraction": 0.8,
  "base64": False
}}

Prompt: {"prompt": "Red Dead Redemption painted by Casper David Friedrich",
  "negative_prompt": " poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark”,
  "style": "base",
  "samples": 1,
  "scheduler": "UniPC",
  "num_inference_steps": 35,
  "guidance_scale": 8,
  "strength": 0.2,
  "seed": 3631838799,
  "img_width": 1024,
  "img_height": 1024,
  "refiner": True,
  "high_noise_fraction": 0.8,
  "base64": False
}}

Prompt: {"prompt":”colorful illustration in the style of Killian Eng , low angle with a vanishing point perspective, looking out across mushroom tree of life with lord Shiva from clouds of mist and spray, carved of stone and covered in fiery shiny gold and other precious metals, covered in centuries of leaves and vines it rises from the water guarding the waterfalls, flowering plants and vines, greenery, trees, crepuscular rays, bioluminescence,kantele rim lighting, atmospheric lighting, light reflected off water, uplighting, exquisite detail, ultra-photorealism, breathtaking, glorious, beautiful, vibrant intense Night Vast landscape, Award-winning concept art, a highly detailed vast field of poppies with a distant waterfall and canyons, spire of redrocks and trails at sunset, rich lush vegitation, unique trees, caves, supercell clouds, storms, stars, coatl, floating lights all around, nebula sky, hyperrealism, luminism, wide angle lens, 24mm, fine ultra-detailed realistic + ultra colonnade photorealistic + Hasselblad H6D + high definition + 64k + cinematic + color grading + depth of field + photo-realistic + film lighting + rim lighting”,                                                          
”negative prompt”:” poorly drawn hands, poorly drawn face, poorly drawn feet, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark”, 
"style": "base",
  "samples": 1,
  "scheduler": "UniPC",
  "num_inference_steps": 35,
  "guidance_scale": 8,
  "strength": 0.2,
}}

In Summary:

Segmind API stands as an excellent solution for businesses and developers seeking streamlined and effective image generation.

With the convenience of auto-scaling APIs and the elimination of intricate management, you can dedicate your efforts to crafting compelling visuals without being burdened by technical intricacies.

Take the leap today and enhance your image generation to a superior level, refining your drafts with each endeavor. Sign up at Segmind and start creating today!

References:

SDXL 1.0 Playground: https://www.segmind.com/models/sdxl1.0-txt2img