Released earlier this year, Stable Diffusion brings powerful text-to-image capabilities to the world. Many different projects have been spun out of it since its release, making it easier than ever to create images like the one below with just a few simple words.

Stable Diffusionwas integratedDifficult, allowing users to create novel images in just a few secondsthreelines of code. Recently the ability to change images viaPainthas also been integrated into the Keras implementation of Stable Diffusion.
In this article, we're going to look at how to do thatto generateandcolorizeImages with stable diffusion in Keras. We offer a thoroughAlNotebook so you can jump straight into a GPU runtime environment. Also, let's look at howXLcan be used to significantly increase the efficiency of stable diffusion in Keras. Let's dive in!
requirements
If you don't want to install anything on your computer, click the button below to open the related Colab notebook and follow from there.
To set up Stable Diffusion in Keras locally on your computer, follow the steps below. Python 3.8 was used for this article.
Step 1 - Clone the project repository
Open a terminal and run the following command to clone the project repository with Git, then navigate to the project directory.
Git-Clone https://github.com/AssemblyAI-Examples/stable-diffusion-keras.gitcd stable-diffusion-keras
Step 2 - Create a virtual environment
If you want to keep all dependencies for this project isolated on your system, create and activate a virtual environment:
python -m venv venv# Activate (MacOS/Linux) source venv/bin/activate# Activate (Windows).\venv\Scripts\activate.bat
You may need to usePython3
Instead ofPython
if you have both Python 2 and Python 3 installed on your computer.
Step 3 - Install dependencies
Finally, install all required dependencies by running the following command:
pip install -r requirements.txt
How to use the stable diffusion in Keras - Basic
We can use Stable Diffusion in just three lines of code:
from keras_cv.models import StableDiffusionmodel = StableDiffusion()img = model.text_to_image("Iron Man makes breakfast")
We import the firstStack diffusion
class of Keras and then create an instance of it,model
. We then use thetext_to_image()
Method of this model to generate an image and store it in theBild
Variable.
If we want to save the image additionally, we can import and use itpillow:
from keras_cv.models import StableDiffusionfrom PIL import Imagemodel = StableDiffusion()img = model.text_to_image("Iron Man macht Frühstück")Image.fromarray(img[0]).save("simple.png")
We select the first (and only) image from the stack asBild[0]
and then convert it into a pillowBild
abovefromarray()
. Finally we save the image in the file path./einfach.png
about the.save up()
Method.
With a terminal open in the project directory, you can run the above script by typing the following command, which will use theeinfach.py
Script:
python einfach.py
Again, you may need to usePython3
Instead ofPython
. The following image of Iron Man Makes Breakfast is generated and saved./einfach.png
:

That's all you need to start using Stable Diffusion in Keras! In the next section we will look at more advanced uses like inpainting. Alternatively, jump downJIT compilation via XLASection to see how Keras can increase the speed of stable diffusion.
How to Use Stable Diffusion in Keras - Advanced
Let's now take a look at its advanced uses, both for imaging and inpainting. Colab notebook linked below makes it easy e.g. change inpainting area with sliders so feel free to follow there if you want:
All advanced imaging and inpainting codes can be found inmain.py
.
image generation
When we instantiate the Stable Diffusion model, we have the option to pass some arguments. In the following we specify both the image height and the width as 512 pixels. Each of these values must be a multiple of 128 and will be rounded to the nearest value if not. In addition, we also state that we doNotwant to compile the model just-in-time with XLA (more details in theJIT compilation via XLASection).
model = StableDiffusion (img_height=512, img_width=512, jit_compile=False)
Next we create a dictionary of arguments to pass to thetext_to_image()
Method. The arguments are:
prompt
- a description of the scene you want a picture ofbatch_size
- the number of images to be generated in an inference (limited by memory)num_steps
- thenumber of stepsim usingdiffusion processunconditional_leadership_scale
- the lead weight forclassifier-free guidanceTogether
- a random seed to use
options = dict( prompt="An alien riding a skateboard in space, vaporwave aesthetic, trending on ArtStation ", batch_size=1, num_steps=25, unconditional_guidance_scale=7, seed=119)
From here the process is very similar to the above - we run the inference and then save the output asgenerated.png
.
img = model.text_to_image(**options)Image.fromarray(img[0]).save("generated.png")
Note that this can be done on both the CPU and the GPU. With an i5-11300H it takes approx5 minutesto create an image with the above settings. With a GPU, it should only last approximately30 seconds.
painting
Now let's see how to inpaint in Keras with Stable Diffusion. First we download an image to modify itman-on-skateboard.jpg
Use ofRequests
Package:
file_URL = "https://c0.wallpaperflare.com/preview/87/385/209/man-riding-on-the-skateboard-photography.jpg"r = request.get(file_URL)with open("man-on -skateboard.jpg", 'wb') als f: f.write(r.content)
This is the resulting downloaded image

This image has a size of 910 x 607 pixels. Before we continue, let's crop it to 512 x 512 pixels. We define the lower left corner of the crop region as (x_start
,y_start
) and set the crop area to 512 pixels wide and 512 pixels high.
x_start = 80 # start x coordinate from the left side of the image width = 512y_start = 0 # start y coordinate from the BOTTOM of the image height = 512
If you join in Colab, you can adjust these values with the sliders:

Then we open the original image and convert it to a NumPy array so we can modify it:
im = Image.open("man-on-skateboard.jpg")img = np.array(im)
We carry out the harvest, where the unusual arithmetic for thejThe direction comes from the fact that we have defined our crop with an origin at the bottom left of our image, while NumPy treats the top left corner of the image as the origin. We then save the cropped image inman-on-skateboard-cropped.png
.
img = img[im.height-height-y_start:im.height-y_start, x_start:x_start+width]new_filename = "man-on-skateboard-cropped.png"Image.fromarray(img).save(new_filename)
Now it's time to create the inpainting mask. The inpainting mask defines the area of the image that you want Stable Diffusion to modify. We define the values here:
x_start = 134x_ende = 374y_start = 0y_end = 369
Again, if you follow the Colab notebook, you can use the sliders to adjust this area.

We open the cropped image as an array as before, and then create a mask with the same shape as the array, with each value in the array being a 1. Then we replace the area defined by the inpainting mask with zeros indicating this model that this is the region we want to color.
im = Image.open("man-on-skateboard-cropped.png")img = np.array(im)# Intiializemask = np.ones((img.shape[:2]))# Wende maskmask[img.shape [0]-y_start-y_end:img.shape[1]-y_start, x_start:x_end] = 0
Next, we expand the dimensions of both the mask and image arrays, since the model expects a batch dimension.
mask = np.expand_dims(mask, axis=0)img = np.expand_dims(image, axis=0)
Now it's time to define our inpainting options. We pass the image array to theBild
argument and the mask array tomask
Fight. Other than that, all arguments are the same except for the following:
num_resamples
- how often the inpainting is resampled. Increasing this number willimprove semantic fitat the expense of more calculationdiffusion noise
- optional custom noise to initiate the diffusion process - eitherTogether
ordiffusion noise
must be present, but not bothdetailed
- a boolean value that defines whether to print a progress bar
inpaint_options = dict( prompt="A golden retriever on a skateboard", image=img, # tensor of RGB values in [0, 255]. Shape (batch_size, H, W, 3) mask=mask, # mask of binary values from 0 or 1 num_resamples=5, batch_size=1, num_steps=25, unconditional_guidance_scale=8.5, diffusion_noise=None, seed=SEED, verbose=True,)
Finally, we instantiate the model again, run the inference, and store the resulting array as above. The picture is saved./bemalt.png
.
inpainting_model = StableDiffusion(img_height=img.shape[1], img_width=img.shape[2], jit_compile=False)inpainted = inpainting_model.inpaint(**inpaint_options)Image.fromarray(inpainted[0]).save("inpainted .png")
Below we see a GIF of the original cropped image, the inpainted area, and the resulting image generated by Stable Diffusion.

It is again possible to run this inference on both CPU and GPU. For an i5-11300H it takes about 22Protocolto inpaint with the above settings. With a GPU it should just lasta few minutes.
JIT compilation via XLA
Languages like C++ are traditionally compiled ahead of time (AOT), which means that the source code is compiled into machine code and that machine code is then executed by the processor. On the other hand, Python is generally interpreted. This means that the source code is not precompiled, but is interpreted by the processor at runtime. While no compilation step is required, the interpretation is slower than running an executable.
more details
Note that the above description is a simplification for the sake of brevity. In reality, the process is more complicated. For example, C++ is generally compiled into object code. Multiple object files may then be joined together by a linker to create the final executable, which is run directly by the processor.
Similarly, Python (or more specifically its most common implementation, CPython) is compiled into bytecode, which is then interpreted by Python's virtual machine.
These details are not essential to understanding JIT compilation, we include them here only for completeness.
Just-in-time (JIT) compilationis the process of compiling code at runtime. While there is some overhead in compiling the function, once it is compiled it can be executed much faster than an interpreted equivalent. It means thatFunctions that are called repeatedly benefit from JIT compilation.
XL, orAccelerated Linear Algebra, is adomain specificCompiler designed specifically for linear algebra. Stable Diffusion in Keras supports JIT compilation via XLA. It means thatWe can compile Stable Diffusion into an XLA compiled version that has the potential to run much fasterthan other implementations of stable diffusion.
Benchmarks
We can see in the graph below that Keras' implementation of Stable Diffusion runs significantly faster than the Hugging Face implementation in thediffusersLibrary:

Note that these numbers reflect warm start generation - Keras is actually slower on a cold start. This is to be expected since the compilation step adds time to cold start generation. As notedhere, this is not a big problem since compiling it in a production environment would be a one-time cost that would be amortized over the (hopefully) many, many inferences the model would perform.
Combining XLA andmixed accuracytogether enable the fastest execution speeds. Below we see the running times for all combinations of with/without XLA and Mixed Precision:

You can do these experiments yourself on Colabhereor look at some additional metrics like cold boot timeshere.
Conclusion
That's all you need to get started with Stable Diffusion with Keras! Keras' Stable Diffusion is a high-performance implementation that requires only a few lines of code and is a great choice for a variety of applications.
If you have additional questions about text-to-image mockups, check out some of the following resources for more information:
- How do I create a text-to-image mockup?
- What is classifier-free leadership?
- Was ist Prompt-Engineering?
- How does DALL-E 2 work?
- How does Imagen work?
Alternatively, you can also follow ourYoutubeChannel,Twitter, orNewsletterto keep up to date with our latest tutorials and deep dives!