Python and Pillow: How to Code Something DALL-E Cannot Do

In this blog post, I will showcase some of the generative art I created using the Python programming language, specifically utilizing the Pillow and Torch libraries. The inspiration for my artwork came from the visual compositions of Roman Haubenstock-Ramati, an Austrian music composer and visual artist.

In early 2021 I was frequently browsing Catawiki because I wanted to buy some art to decorate my home office. When I came across Haubenstock-Ramati’s creations on Catawiki in early 2021, I was immediately captivated by the intricate and beautiful nature of his parametric art. I had been wanting to do something creative with my coding skills for a while, I was inspired to develop code that could produce similar output. The image below is an example of one of the images that inspired me, created by Roman Haubenstock-Ramati.

Konstellationen, 1970/1971 by Roman Haubenstock-Ramati

After the release of Dall-E 2 in April 2022, I explored using the model to generate artwork that should resemble Haubenstock-Ramati’s work. Asking the model to do this is a controversial topic as there are valid concerns about the capacity of AI models being able to produce output that is so similar to the work of an artist that the output can be considered as a copyright infringement on the original work. This discussion is beyond the scope of this blogpost but I want to make clear that the prompts I fed into Dall-E were not intended to produce exact copies of Haubenstock-Ramati’s work or to devalue his works. The same goes for the code I have written they are not intended to distribute copies of his work but merely a demonstration of how you can use Python to create visual geometric compositions.

The output of DALL-E was interesting, but it did not quite capture the essence of his original pieces. The output lacked the precise constraints and intricacies present in Haubenstock-Ramati’s art. I tried many variations of prompts but could never get close to what I wanted.

Some of the outputs generated by Dall-E given my prompt: “Create a painting composition in the style of Roman Haubenstock-Ramati that incorporates elements of graphic notation and experimental musical composition. The painting should be predominantly black and white with bold lines and geometric shapes, and should include a central motif that represents the theme of the piece.”

In an attempt to simplify the process, I posed a simpler request to Dall-E: “Draw a vertical line connected to a rectangle, connect a square to the line, and connect the square with another vertical line to another rectangle, and finally connect the rectangle to a circle with another vertical line.” Surprisingly, the results were unexpected. Despite the simplicity of the prompt, Dall-E struggled to understand the intended relationships between the shapes, yielding unexpected outcomes.

Images generated by Dall-E given the prompt: “Draw a vertical line connected to a rectangle, connect a square to the line, and connect the square with another vertical line to another rectangle, and finally connect the rectangle to a circle with another vertical line.”

It became clear to me Dall-E does not have the capacity to process geometrically constrained prompts, I tried an even simpler prompt: “Create a drawing that only shows two orthogonal lines”. This also proved to be too difficult.

Images generated by Dall-E give the prompt: “Create a drawing that only shows two orthogonal lines”

This inability of Dall-E surprised me, but when thinking about how a model like Dall-E works, this is not surprising. It is based on latent diffusion which is inherently a noisy process and not optimised for constraint-based exact prompts.

Next, I will show the images I generated and talk in more detail about how to code something like this.

Single example of an image generated by my code.A gif showing different images created by my code, showing the diversity of images created by the same parameters.

I made these images using Python and Pillow, devoid of any machine learning. The images produced by my code have elements of randomness introduced via Torch, a versatile package that I utilized for its familiarity and convenience. It is normally a package used in Machine Learning (ML). But again these images are not made using Machine Learning (ML).

You might wonder where the diversity of the images comes from, I personally love how my code is able to generate images that give a similar vibe but are all so different if you look closely. The diversity in outputs was an essential characteristic to achieve. The variance of the images my code produces stems from an intricate use of random variables. A random variable, in the realm of probability theory and statistics, is a variable whose possible values are outcomes of a random phenomenon.

Now I will describe the generation process of the images made by my code and show some examples in Python of how this generation process looks like from a high level perspective.

We can divide the generation process into 3 steps.

Step 1: The centerpiece is generated. This is done by sampling a rectangle, a line, a rectangle, a square, a line, and a circle. These are placed in a fixed position and the size of the shapes are determined by random variables.Step 2: Three clusters with lines and adjacent are sampled from three different distributions. In each cluster a number of vertical lines with various starting and ending points are placed.Step 3: Circles and rectangles are sampled and drawn within the clusters of lines.Gif showing the step-by-step generation process of a single image.

**Step 1**

To understand the role of random variables in my code, consider the first step in our image creation process: forming a portrait-style rectangle, characterized by its greater height than its width. This rectangle, although seemingly simple, is an embodiment of random variables in action.

A rectangle can be dissected into four principal elements: a starting x and y coordinate, and an ending x and y coordinate. Now, these points, when chosen from a specific distribution, transform into random variables. But how do we decide the range of these points, or more specifically, the distribution they come from? The answer lies in one of the most common and crucial distributions in statistics: The Normal Distribution.

Defined by two parameters — the mean (μ) and standard deviation (σ), the Normal Distribution plays a pivotal role in our image generation process. The mean, μ, signifies the center of the distribution, thus acting as the point around which the values of our random variables gravitate. The standard deviation, σ, quantifies the degree of dispersion in the distribution. It decides the range of values the random variables could potentially take. In essence, a larger standard deviation would result in greater diversity in the images created.

import torch

canvas_height = 1000

canvas_width = 1500

#loop to show different values

for i in range(5):

#create normal distribution to sample from

start_y_dist = torch.distributions.Normal(canvas_height * 0.8, canvas_height * 0.05)

#sample from distribution

start_y = int(start_y_dist.sample())

#create normal distribution to sample height from

height_dist = torch.distributions.Normal(canvas_height * 0.2, canvas_height * 0.05)

height = int(height_dist.sample())

end_y = start_y + height

#start_x is fixed because of this being centered

start_x = canvas_width // 2

width_dist = torch.distributions.Normal(height * 0.5, height * 0.1)

width = int(width_dist.sample())

end_x = start_x + width

print(f”start_x: {start_x}, end_x: {end_x}, start_y: {start_y}, end_y: {end_y}, width: {width}, height: {height}”)start_x: 750, end_x: 942, start_y: 795, end_y: 1101, width: 192, height: 306

start_x: 750, end_x: 835, start_y: 838, end_y: 1023, width: 85, height: 185

start_x: 750, end_x: 871, start_y: 861, end_y: 1061, width: 121, height: 200

start_x: 750, end_x: 863, start_y: 728, end_y: 962, width: 113, height: 234

start_x: 750, end_x: 853, start_y: 812, end_y: 986, width: 103, height: 174

Sampling a square looks very similar we only have to sample the height or the width as they are the same. Sampling a circle is even easier as we only have to sample the radius.

Drawing a rectangle in Python is a straightforward process, especially when utilizing the Pillow library. Here’s how you can do it:

from PIL import Image, ImageDraw

# Create a new image with white background

# Loop to draw rectangles

for i in range(5):

img = Image.new(‘RGB’, (canvas_width, canvas_height), ‘white’)

draw = ImageDraw.Draw(img)

# Creating normal distributions to sample from

start_y_dist = torch.distributions.Normal(canvas_height * 0.8, canvas_height * 0.05)

start_y = int(start_y_dist.sample())

height_dist = torch.distributions.Normal(canvas_height * 0.2, canvas_height * 0.05)

height = int(height_dist.sample())

end_y = start_y + height

start_x = canvas_width // 2

width_dist = torch.distributions.Normal(height * 0.5, height * 0.1)

width = int(width_dist.sample())

end_x = start_x + width

# Drawing the rectangle

draw.rectangle([(start_x, start_y), (end_x, end_y)], outline=’black’)

img.show()

**Step 2**

In the context of the vertical lines in these images, we consider three random variables, namely:

The beginning y-coordinate of the line (y_start)The ending y-coordinate of the line (y_end)The x-coordinate of the line (x)

Since we are dealing with vertical lines, only one x-coordinate needs to be sampled for each line. The width of the line is constant, controlled by the size of the canvas.

Some additional logic was needed to ensure the lines didn’t intersect. To do this basically, we need to keep account of the image as a grid and keep track of the occupied positions. Let’s disregard that for the sake of simplicity.

Here’s an example of how this looks like in Python.

import torch

from PIL import Image, ImageDraw

# Setting the size of the canvas

canvas_size = 1000

# Number of lines

num_lines = 10

# Create distributions for start and end y-coordinates and x-coordinate

y_start_distribution = torch.distributions.Normal(canvas_size / 2, canvas_size / 4)

y_end_distribution = torch.distributions.Normal(canvas_size / 2, canvas_size / 4)

x_distribution = torch.distributions.Normal(canvas_size / 2, canvas_size / 4)

# Sample from the distributions for each line

y_start_points = y_start_distribution.sample((num_lines,))

y_end_points = y_end_distribution.sample((num_lines,))

x_points = x_distribution.sample((num_lines,))

# Create a white canvas

image = Image.new(‘RGB’, (canvas_size, canvas_size), ‘white’)

draw = ImageDraw.Draw(image)

# Draw the lines

for i in range(num_lines):

draw.line([(x_points[i], y_start_points[i]), (x_points[i], y_end_points[i])], fill=’black’)

# Display the image

image.show()

This however only gives you lines. Another part of the cluster is the circles at the end of the lines, I called these adjacent circles. Random variables also determine their process. First, the fact that there will be an adjacent circle is sampled from a Bernoulli distribution, and the position (left, middle, right) of the shape is sampled from a uniform distribution.

A circle can be defined entirely by a single parameter: its radius. We can consider the length of a line as a condition that influences the radius of the circle. This forms a conditional probability model where the radius (R) of the circle is dependent on the length of the line (L). We use a conditional Gaussian distribution. The mean (μ) of this distribution is a function of the square root of the line length, while the standard deviation (σ) is a constant.

We initially suggest that the radius R, given the line length L, follows a normal distribution. This is denoted as R | L ~ N(μ(L), σ²), where N is the normal (Gaussian) distribution and σ is the standard deviation.

However, this has a small problem: the normal distribution includes the possibility of sampling a negative value. This outcome is not physically possible in our scenario, as a radius cannot be negative.

To circumvent this issue, we can use the half-normal distribution. This distribution, much like the normal distribution, is defined by a scale parameter σ, but crucially, it is constrained to non-negative values. The radius given the line length follows a half-normal distribution: R | L ~ HN(σ), where HN denotes the half-normal distribution. This way, σ is determined by the desired mean as σ = √(2L) / √(2/π), ensuring that all sampled radii are non-negative and that the mean of the distribution is √(2L)

from PIL import Image, ImageDraw

import numpy as np

import torch

# Define your line length

L = 3000

# Calculate the desired mean for the half-normal distribution

mu = np.sqrt(L * 2)

# Calculate the scale parameter that gives the desired mean

scale = mu / np.sqrt(2 / np.pi)

# Create a half-normal distribution with the calculated scale parameter

dist = torch.distributions.HalfNormal(scale / 3)

# Sample and draw multiple circles

for _ in range(10):

# Create a new image with white background

img_size = (2000, 2000)

img = Image.new(‘RGB’, img_size, (255, 255, 255))

draw = ImageDraw.Draw(img)

# Define the center of the circles

start_x = img_size[0] // 2

start_y = img_size[1] // 2

# Sample a radius from the distribution

r = int(dist.sample())

print(f”Sampled radius: {r}”)

# Define the bounding box for the circle

bbox = [start_x – r, start_y – r, start_x + r, start_y + r]

# Draw the circle onto the image

draw.ellipse(bbox, outline =’black’,fill=(0, 0, 0))

# Display the image

img.show()

**Step 3**

Step 3 in our process is a combination of elements from Steps 1 and 2. In Step 1, we tackled the task of sampling and drawing rectangles in set positions. In Step 2, we learned how to use the normal distribution to draw lines on a portion of your canvas. Additionally, we acquired knowledge on how to sample and draw circles.

As we transition to Step 3, we are going to repurpose the techniques from the previous steps. Our aim is to distribute squares and circles harmoniously around the lines that we sampled earlier. The normal distribution, will once again come in handy for this task.

We will re-use the parameters used to create clusters of lines. However, to enhance the visual appeal and avoid overlaps, we introduce some noise to the mean (mu) and standard deviation values.

In this step, instead of positioning lines, our task is to place sampled rectangles and circles. I encourage you to play around with these techniques and try if you can add circles and rectangles to your cluster of lines.

In this blog post, I’ve dissected and simplified the processes underpinning my code to enable a deeper understanding of how it operates. I’ve shown the difficulty for generative AI models like Dall-E to follow precise constraints.

Writing the code that produced these images was a great experience for me. Seeing the image progress with each line of code I wrote was so cool to witness. I hope that this blog post has piqued your interest in the intersection of art and coding. I encourage you to use your coding skills and bring your imagination alive using code. There’s no need to exhaust your Dall-E credits; the power to create is right at your fingertips.

How I Created Generative Art with Python That 10000 DALL-E Credits Could Not Buy was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.