Press "Enter" to skip to content

The benefits and biases of AI image generation

Recent innovations in Artificial Intelligence (AI) technology have made image generation much easier. Over the course of the summer, companies like Google and OpenAI, as well as other smaller programming teams, have released access to their attempts at creating an algorithm capable of generating images from text. 

One of the first to release to the public, and therefore the most popular, DALL-E Mini (now called crAIyon) gives users the ability to generate somewhat realistic images with the same ease as searching for an image on Google; just enter a few words and within a minute, nine separate images are generated that attempt to fit your prompt. While most of the results are clearly artificial and somewhat rudimentary, the website is still an example of how quickly the technology has evolved, and how many groups have been trying to replicate its success.

Even though each model has its own intricacies and algorithms that make it function, they all work in almost the same way. The first step is to teach the AI to recognize objects within images. This is done with a large dataset of images and text descriptions of what the images contain. From this, the algorithm learns to recognize patterns within the image and teaches itself to understand the difference between a dog and a stop sign. Once the AI can understand what words and phrases correspond to which types of images, it can then be used to generate an image based on text alone.

While it seems like a fun novelty to be able to imagine an image and have it appear before you, there are several concerns that arise with this technology. One of the first was pointed out by OpenAI while they were contemplating how to release access to their DALLE 2 framework: should we allow the internet to create images of everything? It may be fun to generate images of artwork or animals, but what should the algorithm do when asked to generate explicit images? When and where should the line be drawn, should users be able to create violent, hateful, or pornographic content? Is it ethical to train an AI to generate these images, or even to have these types of images in datasets?

The concerns around image generation go beyond the user input. What if a user doesn’t ask for explicit or derogatory information, but it is generated regardless? This is an issue that comes with dataset bias; if the dataset used to train an AI has a bias in a certain direction, it will often lead to the algorithms themselves implementing those biases. Since the dataset for these complex algorithms needs to be so large, the images and their metadata are often scraped from the internet, bringing along their biases with them. This has caused many algorithms to adopt the biases and stereotypes of human society, with prompts containing “flight attendant” often depicting women, or “lawyers” depicting old white men. Although these biases have become a recent problem, many prominent companies have begun cleaning their datasets to account for these human biases. 

Removing biased and explicit content is still not enough for some critics; however, many artists see the technology as a threat to their livelihood. Most can agree that it is acceptable for a machine to imitate the works of dead artists, like Picasso, Monet, and Van Gogh. An issue arises when machines imitate the work of living, working artists, compounded by the fact that some of these companies are profiting from their technology. In order to imitate the works of these artists, the AI must train on their previous pieces of art, learning the style of each individual artist. Is it ethical, or even legal, for these companies to train their AI on other people’s art in order to generate and sell new content? Who then owns the image—the company who made the algorithm, the artists used in the dataset, the user who chose the text, or the algorithm that generated the image?

While there is a large amount of controversy surrounding the topic of text-to-image generation, one thing that both users and developers can agree on is that this technology is still in its infancy. It seems like every month a new iteration is released with slightly more realistic image generation. Some projects have even evolved to audio and video generation, and we may not be far from the first computer-generated movie. However, with every step this technology takes, it is important to remember the issues that come alongside it and to protect the artists that helped create it.

Be First to Comment

Leave a Reply