AI image generation – Wombo

I was playing around with the WOMBO DREAM app today to generate original images from text prompts, and decided to share some here.

Wombo is a Canadian app that according to their website was launched 28th February, 2021. Their first app was a lip sync app that was highly downloaded. Their technology is “based on GAN (generative adversarial networks) methodology.” Users would up load a selfie to the model which was “facial recognition trained to detect facial features/motions in our ‘driving video’ (the video performance behind each song in our library) and apply them to a user’s still image to animate it.”

The app I used today is a newer one called Dream by Wombo.art

Similar to GPT-3 you enter a prompt to generate an image. You can select a style or you can leave that choice blank. The images generated tend to be more abstract that OpenAI’s DALL-E. Compare how they both react to the prompt “an avocado in the shape of an armchair”.

These chairs above are created by DALL-E as displayed on the OpenAI website.

The chairs above were created by Wombo.art DREAM app using the same prompt by me on 6th Dec 2021. The first on the left using no art style, the middle using the pastel art style, and the far right using the fantasy art style. The model is possibly smaller than DALL-E’s 12 billion parameters, and I can’t find reference to the training data used. Nevertheless, I enjoy the abstract nature of the Wombo model.

As it seems suited to abstract images, I tried typing in some phrases and concepts from my PhD work.

For the piece, I have called “Prometheus machine” I used the prompt “the prometheus intersection of prompt design and training data” and the art style “vibrant”. This generated image reflects the precise area of AI that inspires me, the fire of creation between prompt design and training data. I love that it has given this moment a standpoint of human form. I find this especially fitting as I have enjoyed giving generative models a standpoint when asking for task completion. There are so many very deep layers in our minds that will come out in text and images we create that often we can’t adequately express them all. Giving those deep layers form and standpoint is a much richer way of describing the precise point of humanity we wish to evoke than lengthy text. Though perhaps we should ask ourselves, is Prometheus the figure in the middle of the head and bust appearing before them as they paint from the vantage point of the shoulder of this imaginary digital giant. Or, is Prometheus the form being created.

Somewhat narcissistically, I wanted to see how my working PhD thesis title would be rendered. I was in no way disappointed! The prompt is my thesis title “Through a dark glass, clearly” and I used the art style “HD”. I love this; I see fragmented stochastic values on the forest floor as a tree grows from those fractured shards. The spotlight on the mirror represents to me how I’m drawn into my thesis work.

The prompt I used for this piece was “mimetic substructures in the dark glass of AI” and I used the style “psychic”. The idea of mimetic substructures in AI came to me in a dream after a COVID-19 vaccine. It was one of those particularly lucid dreams that made you wake with a start knowing you needed to grasp the dreamscape firmly with your conscious mind to retain a stable connection. I feel this image portrays the idea pretty well. To me, mimetic substructures are embedded into training data but can be unlocked when the system is given dynamism in the form of intentionality embedded in a prompt. The obvious line through the center is like the meniscus curve hovering between the training data and the rest of the system.

Above are a couple more on the theme of mimetic substructures. The image on the left shows the separation of training data and the external (input/output) by context but connected by neurons of deep layer mimetic patterns to generated outputs. I like the noise of dark lines in the middle as if newspaper has been pressed against the image. The image on the right reminds me of the mis en abyme of great texts like Doestoevsky’s The Grand Inquisitor.

This last image is perhaps my favourite and it relates to the paper I am writing on GPT-3 (with co-authors  Pistilli, Panai, Menéndez González, Dias Duran, KalpokienÄ—, & Bertulfo). The paper has had a few names during its creation; firstly it was called “The Ghost in the Machine has an American accent”. Perhaps this image is inspiration for a new title. For the prompt of this image I used a phrase I particularly like in our paper “fragmented values and stochastic democracy”. The term fragmented values comes from Thomas Nagel’s work on values and conflicts in moral philosophy. The idea of stochastic democracy refers to how more represented voices in the training data will influence stochastic decisions in these types of AI models.

In this image you can clearly see the dipped curve of a stochastic gradient descent map. The values are fragmented like ice crystals falling off the peaks and the model seems to have chosen red, white, and blue to represent democracy. Despite democracy being far older than the USA, and even if you were to hope the tri-colours are inspired by the old French images of Liberté, égalité, fraternité, we see the unmistakable watermark of stripes of the USA flag in the background.

There is just as much bias and prejudice to be unpacked in these image generative models as in the text models like GPT-3. I think it is even more fascinating though as the added dimensions make the system explode with possibility.

This blog post was brought to you today by a PhD student who is avoiding marking their Master’s students essays. In the spirit of that I bring you “Procrastinate in the dreamscape”