Stable Diffusion in Keras - A Simple Tutorial (2023)

Released earlier this year, Stable Diffusion brings powerful text-to-image capabilities to the world. Many different projects have been spun out of it since its release, making it easier than ever to create images like the one below with just a few simple words.

Stable Diffusion in Keras - A Simple Tutorial (1)

Stable Diffusionwas integratedDifficult, allowing users to create novel images in just a few secondsthreelines of code. Recently the ability to change images viaPainthas also been integrated into the Keras implementation of Stable Diffusion.

In this article, we're going to look at how to do thatto generateandcolorizeImages with stable diffusion in Keras. We offer a thoroughAlNotebook so you can jump straight into a GPU runtime environment. Also, let's look at howXLcan be used to significantly increase the efficiency of stable diffusion in Keras. Let's dive in!


If you don't want to install anything on your computer, click the button below to open the related Colab notebook and follow from there.

Colab notebook

To set up Stable Diffusion in Keras locally on your computer, follow the steps below. Python 3.8 was used for this article.

Step 1 - Clone the project repository

Open a terminal and run the following command to clone the project repository with Git, then navigate to the project directory.

Git-Clone stable-diffusion-keras

Step 2 - Create a virtual environment

If you want to keep all dependencies for this project isolated on your system, create and activate a virtual environment:

python -m venv venv# Activate (MacOS/Linux) source venv/bin/activate# Activate (Windows).\venv\Scripts\activate.bat

You may need to usePython3Instead ofPythonif you have both Python 2 and Python 3 installed on your computer.

Step 3 - Install dependencies

Finally, install all required dependencies by running the following command:

pip install -r requirements.txt

How to use the stable diffusion in Keras - Basic

We can use Stable Diffusion in just three lines of code:

from keras_cv.models import StableDiffusionmodel = StableDiffusion()img = model.text_to_image("Iron Man makes breakfast")

We import the firstStack diffusionclass of Keras and then create an instance of it,model. We then use thetext_to_image()Method of this model to generate an image and store it in theBildVariable.

(Video) Learn to Speed up Stable Diffusion with KerasCV Tensorflow Model | Low-Code Stable Diffusion

If we want to save the image additionally, we can import and use itpillow:

from keras_cv.models import StableDiffusionfrom PIL import Imagemodel = StableDiffusion()img = model.text_to_image("Iron Man macht Frühstück")Image.fromarray(img[0]).save("simple.png")

We select the first (and only) image from the stack asBild[0]and then convert it into a pillowBildabovefromarray(). Finally we save the image in the file path./einfach.pngabout up()Method.

With a terminal open in the project directory, you can run the above script by typing the following command, which will use theeinfach.pyScript:


Again, you may need to usePython3Instead ofPython. The following image of Iron Man Makes Breakfast is generated and saved./einfach.png:

Stable Diffusion in Keras - A Simple Tutorial (2)

That's all you need to start using Stable Diffusion in Keras! In the next section we will look at more advanced uses like inpainting. Alternatively, jump downJIT compilation via XLASection to see how Keras can increase the speed of stable diffusion.

How to Use Stable Diffusion in Keras - Advanced

Let's now take a look at its advanced uses, both for imaging and inpainting. Colab notebook linked below makes it easy e.g. change inpainting area with sliders so feel free to follow there if you want:

Colab notebook

All advanced imaging and inpainting codes can be found

image generation

When we instantiate the Stable Diffusion model, we have the option to pass some arguments. In the following we specify both the image height and the width as 512 pixels. Each of these values ​​must be a multiple of 128 and will be rounded to the nearest value if not. In addition, we also state that we doNotwant to compile the model just-in-time with XLA (more details in theJIT compilation via XLASection).

model = StableDiffusion (img_height=512, img_width=512, jit_compile=False)

Next we create a dictionary of arguments to pass to thetext_to_image()Method. The arguments are:

  • prompt - a description of the scene you want a picture of
  • batch_size- the number of images to be generated in an inference (limited by memory)
  • num_steps- thenumber of stepsim usingdiffusion process
  • unconditional_leadership_scale - the lead weight forclassifier-free guidance
  • Together - a random seed to use
options = dict( prompt="An alien riding a skateboard in space, vaporwave aesthetic, trending on ArtStation ", batch_size=1, num_steps=25, unconditional_guidance_scale=7, seed=119)

From here the process is very similar to the above - we run the inference and then save the output asgenerated.png.

(Video) Diffusion models with KerasCV

img = model.text_to_image(**options)Image.fromarray(img[0]).save("generated.png")

Note that this can be done on both the CPU and the GPU. With an i5-11300H it takes approx5 minutesto create an image with the above settings. With a GPU, it should only last approximately30 seconds.


Now let's see how to inpaint in Keras with Stable Diffusion. First we download an image to modify itman-on-skateboard.jpgUse ofRequestsPackage:

file_URL = ""r = request.get(file_URL)with open("man-on -skateboard.jpg", 'wb') als f: f.write(r.content)

This is the resulting downloaded image

Stable Diffusion in Keras - A Simple Tutorial (3)

This image has a size of 910 x 607 pixels. Before we continue, let's crop it to 512 x 512 pixels. We define the lower left corner of the crop region as (x_start,y_start) and set the crop area to 512 pixels wide and 512 pixels high.

x_start = 80 # start x coordinate from the left side of the image width = 512y_start = 0 # start y coordinate from the BOTTOM of the image height = 512

If you join in Colab, you can adjust these values ​​with the sliders:

Stable Diffusion in Keras - A Simple Tutorial (4)

Then we open the original image and convert it to a NumPy array so we can modify it:

im ="man-on-skateboard.jpg")img = np.array(im)

We carry out the harvest, where the unusual arithmetic for thejThe direction comes from the fact that we have defined our crop with an origin at the bottom left of our image, while NumPy treats the top left corner of the image as the origin. We then save the cropped image inman-on-skateboard-cropped.png.

img = img[im.height-height-y_start:im.height-y_start, x_start:x_start+width]new_filename = "man-on-skateboard-cropped.png"Image.fromarray(img).save(new_filename)

Now it's time to create the inpainting mask. The inpainting mask defines the area of ​​the image that you want Stable Diffusion to modify. We define the values ​​here:

x_start = 134x_ende = 374y_start = 0y_end = 369

Again, if you follow the Colab notebook, you can use the sliders to adjust this area.

Stable Diffusion in Keras - A Simple Tutorial (5)

We open the cropped image as an array as before, and then create a mask with the same shape as the array, with each value in the array being a 1. Then we replace the area defined by the inpainting mask with zeros indicating this model that this is the region we want to color.

(Video) Diffusion models from scratch in PyTorch

im ="man-on-skateboard-cropped.png")img = np.array(im)# Intiializemask = np.ones((img.shape[:2]))# Wende maskmask[img.shape [0]-y_start-y_end:img.shape[1]-y_start, x_start:x_end] = 0

Next, we expand the dimensions of both the mask and image arrays, since the model expects a batch dimension.

mask = np.expand_dims(mask, axis=0)img = np.expand_dims(image, axis=0)

Now it's time to define our inpainting options. We pass the image array to theBildargument and the mask array tomaskFight. Other than that, all arguments are the same except for the following:

  • num_resamples - how often the inpainting is resampled. Increasing this number willimprove semantic fitat the expense of more calculation
  • diffusion noise - optional custom noise to initiate the diffusion process - eitherTogetherordiffusion noisemust be present, but not both
  • detailed - a boolean value that defines whether to print a progress bar
inpaint_options = dict( prompt="A golden retriever on a skateboard", image=img, # tensor of RGB values ​​in [0, 255]. Shape (batch_size, H, W, 3) mask=mask, # mask of binary values from 0 or 1 num_resamples=5, batch_size=1, num_steps=25, unconditional_guidance_scale=8.5, diffusion_noise=None, seed=SEED, verbose=True,)

Finally, we instantiate the model again, run the inference, and store the resulting array as above. The picture is saved./bemalt.png.

inpainting_model = StableDiffusion(img_height=img.shape[1], img_width=img.shape[2], jit_compile=False)inpainted = inpainting_model.inpaint(**inpaint_options)Image.fromarray(inpainted[0]).save("inpainted .png")

Below we see a GIF of the original cropped image, the inpainted area, and the resulting image generated by Stable Diffusion.

Stable Diffusion in Keras - A Simple Tutorial (6)

It is again possible to run this inference on both CPU and GPU. For an i5-11300H it takes about 22Protocolto inpaint with the above settings. With a GPU it should just lasta few minutes.

JIT compilation via XLA

Languages ​​like C++ are traditionally compiled ahead of time (AOT), which means that the source code is compiled into machine code and that machine code is then executed by the processor. On the other hand, Python is generally interpreted. This means that the source code is not precompiled, but is interpreted by the processor at runtime. While no compilation step is required, the interpretation is slower than running an executable.

more details

Note that the above description is a simplification for the sake of brevity. In reality, the process is more complicated. For example, C++ is generally compiled into object code. Multiple object files may then be joined together by a linker to create the final executable, which is run directly by the processor.

(Video) Faster Stable Diffusion Tensorflow-backend on Colab with Gradio Web UI

Similarly, Python (or more specifically its most common implementation, CPython) is compiled into bytecode, which is then interpreted by Python's virtual machine.

These details are not essential to understanding JIT compilation, we include them here only for completeness.

Just-in-time (JIT) compilationis the process of compiling code at runtime. While there is some overhead in compiling the function, once it is compiled it can be executed much faster than an interpreted equivalent. It means thatFunctions that are called repeatedly benefit from JIT compilation.

XL, orAccelerated Linear Algebra, is adomain specificCompiler designed specifically for linear algebra. Stable Diffusion in Keras supports JIT compilation via XLA. It means thatWe can compile Stable Diffusion into an XLA compiled version that has the potential to run much fasterthan other implementations of stable diffusion.


We can see in the graph below that Keras' implementation of Stable Diffusion runs significantly faster than the Hugging Face implementation in thediffusersLibrary:

Stable Diffusion in Keras - A Simple Tutorial (7)

Note that these numbers reflect warm start generation - Keras is actually slower on a cold start. This is to be expected since the compilation step adds time to cold start generation. As notedhere, this is not a big problem since compiling it in a production environment would be a one-time cost that would be amortized over the (hopefully) many, many inferences the model would perform.

Combining XLA andmixed accuracytogether enable the fastest execution speeds. Below we see the running times for all combinations of with/without XLA and Mixed Precision:

Stable Diffusion in Keras - A Simple Tutorial (8)

You can do these experiments yourself on Colabhereor look at some additional metrics like cold boot timeshere.


That's all you need to get started with Stable Diffusion with Keras! Keras' Stable Diffusion is a high-performance implementation that requires only a few lines of code and is a great choice for a variety of applications.

If you have additional questions about text-to-image mockups, check out some of the following resources for more information:

(Video) Creating Stable Diffusion Interpolation Videos

  1. How do I create a text-to-image mockup?
  2. What is classifier-free leadership?
  3. Was ist Prompt-Engineering?
  4. How does DALL-E 2 work?
  5. How does Imagen work?

Alternatively, you can also follow ourYoutubeChannel,Twitter, orNewsletterto keep up to date with our latest tutorials and deep dives!


1. Stable Diffusion - What, Why, How?
(Edan Meyer)
2. Stable Diffusion Consistent Character Poses and Camera Angles Tutorial
3. A developer guide to use Stable Diffusion text-to-image AI model in python
4. Install and Use Stable Diffusion in Python - Use Machine Learning and AI to Transform Text to Images
(Aleksandar Haber)
5. Lesson 9: Deep Learning Foundations to Stable Diffusion, 2022
(Jeremy Howard)
6. Consistent AI Characters with Different Poses Angles - CharTurner Stable Diffusion


Top Articles
Latest Posts
Article information

Author: Terence Hammes MD

Last Updated: 07/11/2023

Views: 6399

Rating: 4.9 / 5 (49 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Terence Hammes MD

Birthday: 1992-04-11

Address: Suite 408 9446 Mercy Mews, West Roxie, CT 04904

Phone: +50312511349175

Job: Product Consulting Liaison

Hobby: Jogging, Motor sports, Nordic skating, Jigsaw puzzles, Bird watching, Nordic skating, Sculpting

Introduction: My name is Terence Hammes MD, I am a inexpensive, energetic, jolly, faithful, cheerful, proud, rich person who loves writing and wants to share my knowledge and understanding with you.