"All Tomorrow’s Parties": AI Synthesis – The End of Copyright as we knew it

AI systems for image and sound synthesis are stochastic libraries capable of interpolation. They require a radical reorientation of copyright law.

In Pocket speichern vorlesen Druckansicht

(Bild: Mr. Tempter / Shutterstock.com)

Lesezeit: 13 Min.
Von
  • René Walter
Inhaltsverzeichnis

(Diesen Artikel gibt es auch auf Deutsch.)

In the age of machine learning, our notions of intellectual property and copyright are facing a radical upheaval. First lawsuits against AI companies offering generative AI systems raise the question of what exactly art and creativity are and why (also in which way) we should protect as well as promote them.

Guest article by René Walter

René Walter is a blogger, graphic designer, typographer and journalist from Berlin. Among other things, he worked for Napster as an art director for three years. He has been writing on the Internet for around 20 years. With his award-winning blog Nerdcore he ran one of the most successful private websites in Germany, and in 2009 he initiated the first nationally sensational meme "Und alle so Yeaahh!". For more than 10 years, he has explored meme theory, algorithmic art, the impact of the digital on human psychology, and the latest developments at the intersection of science, technology, and creativity. Today he runs the newsletter GOOD INTERNET, where he critically follows developments in the field of Artificial Intelligence.

In mid-January 2023, stock photo provider Getty Images initiated initial legal action against Stability AI in the UK, and finally in early February, also in the US. Previously, three artists had filed a lawsuit accusing the company of violating their copyrights with Stable Diffusion (based on the study "Extracting Training Data from Diffusion Models," for example, which MIT Technology Review had reported). In initial reactions to ChatGPT, publishers are calling for an extension of ancillary copyright to generative AI systems.

Collecting societies such as GEMA or VG Wort, which manage the copyrights of their members, face a daunting task. Their distribution mechanisms are becoming a potential plaything for fraudsters through these novel systems, who can deceive them with easy-to-use software and boost distributions in their own favor: with AI-generated content capable of blowing up existing systems – through the automated media synthesis of plausible but not real texts, images and audio data.

Alison Gopnik, a professor of psychology and philosophy at Berkeley, calls the new generative AI models library-like cultural technologies that provide access to and multiply knowledge. The comparison is obvious, if inaccurate, and I would describe the interpolable data spaces computed by algorithms, called latent spaces, from this as "stochastic libraries": a library where you describe to a robotic librarian what book you want, and it picks out an approximate match. Put another way, "AI is like a box of chocolates – you never know what you're going to get."

Stochastic libraries are interpolable databases of their training data: AI systems learn various characteristics of the input through pattern recognition and store them as so-called weights, which can be controlled via parameters. In the case of Stable Diffusion, there are 870 million parameters; in the case of ChatGPT, there are 175 billion. For example, if you create an AI model for paintings by Pablo Picasso, the neural network stores the patterns recognized in the training data for stylistics in brushstrokes, coloring or proportions.

I can control these in turn via the text prompt and if you now want to create a picture in the style of the master in the Picasso AI, you activate the parameters for "Vase", "Flowers", "Fruit" and "Picasso" and the model creates a still life based on the weights of these patterns in its database. The same thing happens in ChatGPT when I remix a Heise IT text in the style of a Ramones song. It is precisely this molecular, interpolative remix principle of generative AI that creates a huge explosive force for existing systems of copyright.

By the very nature of prompt input, which decomposes its input into various tokens - syllables and groups of letters - many of these weights and parameters come together in any image generation. This is another reason why artists' advocates refer to them as "collage tools of the 21st century." This choice of words, however, obscures the view of the interpolative character of the models: each image is generated based on many different parameters, which were previously obtained in AI training from millions of image analyses.

Each synthetic image, the AI music or the generative text are always the result of a multidimensional interpolation of the latent space, in which one generates a five-dimensional space full of possible image syntheses from the parameters "robot", "dog", "meadow", "Picasso" and "flowers", from which syntheses are selected at random (in diffusion models) or according to a reward algorithm. Thus, through the text prompt, I can combine any pattern contained in the database with other patterns to create novel remixes, and so our AI Picasso is suddenly painting robots and spaceships as he never did in real life.

This ability to interpolate between data points poses unprecedented problems, and not just for copyright law: Currently, synthetically generated AI voices are causing displeasure among voice actors, who have recently found clauses in their contracts demanding rights to use their voice data to train synthetic voices. Unions advise against signing such contracts, but it's only a matter of time before movie producers can create any voice imaginable in any tonality, purely by interpolating between the individual learned patterns in the data set. The new villain of the Marvel universe is supposed to sound like Ted Brolin, but with the voice coloration of Bruce Willis and the rhythm of Pee Wee Herman? AI makes it possible.

The training data of generative artificial intelligences, which often contain copyrighted works, are thus converted into parameter banks for "new" synthetic outputs. The well-known science fiction author Ted Chiang, whose short story "The Story of Your Life" provided the template for Denis Villeneuve's film "Arrival," compared Large Language Models in the New Yorker magazine to the lossy data compression of JPGs – a metaphor that seems entirely appropriate considering the dissolution of culture in the atomized Grey Goo of latent space.

The randomness of a stochastic library and the interpolative nature of AI synthesis fundamentally contradict the principles of U.S. and European copyright laws, which require individual, identifiable works by natural persons and a certain level of creation to operate. How such copy rights should respond to an interpolatable latent space in which I can freely combine patterns of existing works at a creative molecular level is entirely unclear, and it depends, as a lawyer would say, "on the individual case."

However, two studies have strongly suggested that diffusion models are capable of exactly replicating the image data used to train them (arXiv preprints: "Investigating Data Replication in Diffusion Models" and "Extracting Training Data from Diffusion Models"), which on the one hand allows copyright infringement and on the other hand can lead to privacy violations.

Complicating matters further is the commercial exploitation of these AI systems. It is true that they were created in a scientific framework and can therefore rely on exceptions in property rights in Europe and the USA, at least during their development. However, these exceptions are subject to higher legal requirements for commercial applications, and both Stability AI and OpenAI and Microsoft have already brought their AI systems to market. This is one reason why the Federal Trade Commission is now investigating OpenAI for breaching due diligence during the launch of ChatGPT.

Copyright holders' collecting societies have so far no approaches to counter these endless stochastic mash-ups of generative AI systems based on atomized culture. Even if creators and rights managers find ways to regulate the stochastic nature of these novel cultural synthesizers in copyright reform, black markets will exist for models that allow users to freely explore the new synthetic worlds. Already, there are hundreds of checkpoints (CKPTs) for Stable Diffusion, derivative AI models that have been trained on the style of specific artists or outright aesthetics.

There is even a Stable Diffusion model for the movie "Cats". It is also already possible today to build your own image generator based on Stable Diffusion, in which you can mix new image worlds with different checkpoint files like ingredients in cooking: "Once specialized CKPT with Cats, Star Trek and Ghibli please", and out comes a gigantic Latent Space, specialized on anime cats from the planet Vulcan and guaranteeing infinite image worlds. Thinking even further into the future, brain-computer interfaces appear on the horizon, allowing real-time visualization of thoughts – digitally enabled lucid dreams while awake. The thought of Disney controlling thoughts, at least in visualized output, is not far off: "I can't show that, Dave."