AI-Generated Art: How Does It Work

Four images generated using the same terms: Paul Klee, German Church Steeple.

Four images generated using the same terms: Paul Klee, German Church Steeple.

Church Steele represents the object I wanted to display.

German is intended to be a modifier to narrow the style of architecture.

Paul Klee is another modifier intended to influence the style or genre of the result.

The actual details of Artificial Intelligence production are quite complicated and varied, and well beyond the scope of this essay. Conceptually, AI generation draws upon a database of images that are merged via some mechanism to create a new image. It’s not just a cut and paste; it is a merging of the objects, styles, and genre of the images the tool selects from its database.

The tool I used draws from a database that is populated with images pulled from the internet and then indexed across multiple (often hundreds) of dimensions[i], such as colour, object, genre, artist, etc.  Terms supplied by the artist are then matched against these dimensions to select a list of input images which are then processed and merged to produce the result.  I’m not clear on the degree to which the exactness of the original images is conveyed into the final product[1]. The execution is somewhat opaque and non-deterministic. For example, suppling the same terms repeatedly will generate different images.  

There are many mechanisms for creating AI art, including procedural ‘rule-based’ generation of images using mathematical patterns, algorithms which simulate brush strokes and other painted effects, and artificial intelligence or deep learning algorithms such as generative adversarial networks and transformers.[ii] Generative adversarial networks (GANs) use a “generator” to create new images and a “discriminator” to decide which created images are considered successful.[iii] More recent models use Vector Quantized Generative Adversarial Network and Contrastive Language–Image Pre-training(VQGAN+CLIP).[iv]

DeepDream, released by Google in 2015, uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia, thus creating a dream-like psychedelic appearance in the deliberately over-processed images.[v] [vi] [vii]

The tool I have been experimenting with, MidJourney[2], uses a GAN-based mechanism

[1] I have used reverse Image Searching tools (Google Lens, Tineye and Yandex) but have found no matches nor near matches with generated images supporting the impression that original images do not carry through the process.

[2] See:

[i] Vox. 2022. The AI that creates any picture you want, explained. June 1, 2022. 

[ii] Wikipedia contributors, “Artificial intelligence art,”  Wikipedia, The Free Encyclopedia, (accessed October 10, 2022).

[iii] Goodfellow, Ian; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua (2014). Generative Adversarial Nets(PDF). Proceedings of the International Conference on Neural Information Processing Systems (NIPS 2014). pp. 2672–2680.

[iv] Burgess, Phillip. “Generating AI “Art” with VQGAN+CLIP”Adafruit. Retrieved July 20,2022.

[v] Mordvintsev, Alexander; Olah, Christopher; Tyka, Mike (2015). “DeepDream – a code example for visualizing Neural Networks”. Google Research. Archived from the originalon 2015-07-08.

[vi] Mordvintsev, Alexander; Olah, Christopher; Tyka, Mike (2015). “Inceptionism: Going Deeper into Neural Networks”. Google Research. Archived from the original on 2015-07-03.

[vii] Szegedy, Christian; Liu, Wei; Jia, Yangqing; Sermanet, Pierre; Reed, Scott E.; Anguelov, Dragomir; Erhan, Dumitru; Vanhoucke, Vincent; Rabinovich, Andrew (2015). “Going deeper with convolutions”. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015. IEEE Computer Society. pp. 19.  arXiv:1409.4842doi:10.1109/CVPR.2015.7298594


Leave a Reply

Your email address will not be published. Required fields are marked *