Brandon Butler, University of Virginia Library
Originally published in Freethink (Link)
Reposted with author’s permission

The past six months or so have seen the seemingly sudden appearance of several startlingly powerful AI tools that create complex new textual and visual works in response to relatively simple prompts. You probably know at least a couple by name: ChatGPT (for text) and Stable Diffusion (for images) are the ones that seem to have taken over my social feeds. These tools are creating a buzz in part because the works they generate are sometimes good enough to pass for or replace the work of humans, at least in some contexts. This raises a laundry list of policy questions, some as old as the story of John Henry (will machines put humans out of work?), others as 21st-century as data sovereignty (how can nations govern data pertaining to their citizens when it flows seamlessly around the globe?).

The inevitable raft of copyright lawsuits raises one key legal question that threatens to stop these AI models in their tracks: Do the creators of these tools need permission from the copyright holders of the works they use to “train” their AI models? After all, building these models requires having AI analyze huge bodies of existing works, and that analysis involves massive amounts of copying of the works involved. The outputs of these models may be new works, but the AI can’t generate new and meaningful output unless it has access to existing works as input.

Lots of smart people have opined on the proper copyright analysis of AI already, so I don’t want to go too deeply down this rabbit hole myself. The technical legal answer I favor is straightforward, and the very short version is that there’s no meaningful difference between these tools and the other “non-consumptive” / computational uses that courts have already blessed as fair use many times over. These uses are fair (meaning outside the exclusive rights of the copyright holder, free for all) because precedent pretty clearly says they are. What copyright tells us about AI is, in my opinion, not necessarily that interesting. (At least not yet, though of course the courts and the Copyright Office may make things more interesting in the coming months and years.) Maybe I’m being too glib about the technical legal answer, but in any case, I want to answer a different question: What can AI tell us about copyright?

If we think carefully through why copyright principles tell us AI training is fair use, we get a kind of guided tour through the most important and foundational values in the US copyright system. Fair use is a flexible, open-ended limitation on copyright that is meant to protect uses that further the purpose of copyright itself. So by exploring copyright’s outer limits through fair use, we better understand copyright and its proper place in the regulation of information.

A foundational question in the debate about AI and copyright is: Whose interests should copyright ultimately protect? The answer is made clear, at least in the United States, where Article I, Section 8, clause 8 of the US Constitution specifies the purpose of copyright: “to promote the progress of Science and the useful Arts.” Granting copyrights “for limited times” (a term of 14 years at the time that clause was written) is a means of promoting the public good.

Congressional action has not always been guided by this principle (witness the extension of copyright term by more than a century despite little evidence of any public benefit), but courts, especially the Supreme Court, acknowledge copyright’s public interest purpose all the time. For example, here’s Justice Kagan in Kirtsaeng v. John Wiley & Sons, Inc.:

“[C]opyright law ultimately serves the purpose of enriching the general public through access to creative works.”

And Justice O’Connor in one of my personal favorites, Feist Pubs., Inc. v. Rural Tel. Svc. Co., Inc.:

“The primary objective of copyright is not to reward the labor of authors, but ‘[t]o promote the Progress of Science and useful Arts.’”

And Twentieth Century Music Corp. v. Aiken:

“[P]rivate motivation must ultimately serve the cause of promoting broad public availability of literature, music, and the other arts.”

And Fox Film Corp. v. Doyal:

“The sole interest of the United States and the primary object in conferring the monopoly lie in the general benefits derived by the public from the labors of authors.”

So, in cases where the private monopoly of copyright comes into direct and repeated conflict with the public good, fair use is applied to bring the system back into balance.

It is sometimes suggested that AI will compete unfairly with human artists, and copyright should offer protection against this kind of competition—the John Henry story. Copyright is indeed a protection against competition, and it puts the brakes on activity that would otherwise lower prices, and reduce barriers to information. If anyone could make and sell (or share for free) copies of any published book, for example, the price of books would very quickly fall toward zero. So, too, would many authors’ and publishers’ interest in writing and publishing new books.

Accordingly, copyright protects authors and publishers for a limited time from a specific kind of competition: competing with copies or certain infringing derivatives (sequels, translations, movie versions, etc.) of their own works. It does not protect authors from other kinds of competition. The public generally benefits from access to lots of works in the same genre, or even in the same style, and copyright law generally does not interfere with this kind of competition.

My Charlottesville neighbor Edgar Allan Poe may have more or less invented the detective story, but once he showed the way, any author was free to follow in his footsteps. And thousands have, exploring every possible iteration from hard-boiled to Scandinavian, and the public gets the benefit. No one (other than perhaps Poe’s heirs) would argue that it was somehow unfair for other authors to try their hands at the detective genre, or that Poe should have received royalties or had a veto over these new stories. All else equal, copyright doesn’t stand in the way of new creations, even when they are in some ways built on the elements of other people’s work. (And of course everything, ultimately, is.)

Similarly, the works created by generative AI may compete with human creators, but not in a way that copyright gives anyone the power to prevent. Just as a human creator is free to read Murders in the Rue Morgue and then write their own story with an eccentric detective solving a mystery alongside the reader, so, too, can an AI bot “read” a corpus of mystery novels in order to learn to write one on its own. Giving us more new works that give us the kind of pleasure we have found in existing works is not a bug, from a copyright perspective, but a feature, as long as the new works are not themselves infringing.

But of course, unlike Sir Arthur Conan Doyle (whose Sherlock Holmes stories fit very nicely into Poe’s Rue Morgue template), an AI model has to literally copy existing works in order to metaphorically “read” them and develop a model for creating a new detective story. Should that make a difference? No, because fair use limits the literal application of copyright when it would undermine copyright’s more general purposes.

The Supreme Court emphasized this role in its most recent fair use opinion, Google v. Oracle. In that case, Justice Breyer describes the role of fair use in the context of software copyrights:

“…fair use can play an important role in determining the lawful scope of a computer program copyright… It can distinguish between expressive and functional features of computer code where those features are mixed. It can focus on the legitimate need to provide incentives to produce copyrighted material while examining the extent to which yet further protection creates unrelated or illegitimate harms in other markets or to the development of other products. In a word, it can carry out its basic purpose of providing a context-based check that can help to keep a copyright monopoly within its lawful bounds.”

As examples of how fair use has played this role in the past, Justice Breyer cited cases like Sony v. Connectix and Sega v. Accolade, cases where software engineers made copies of protected works in a process that resulted in the development of new, non-infringing software. Yes, these cases say, there is literal copying involved in this process, but the end result (and the only thing offered to the public in competition with the works that were copied “behind the curtain”) is something new and non-infringing — exactly the kind of creativity that copyright is meant to promote, not discourage. So, fair use acts as a context-based check on the otherwise overly broad literal scope of copyright’s exclusive rights, shielding these intermediate, back-room, pro-competitive copies from liability and enabling the creation of valuable new works.

Similarly, in the Oracle case, Justice Breyer held that Google had created a valuable new work in the Android mobile operating system and that the use of elements of Oracle’s Java language to enable programmers to interact more easily with Android was fair.

Fair use unlocks unprotected elements of protected works

In-copyright works contain all kinds of things that are not themselves protected by copyright. Facts, for example, are often recorded or revealed for the first time in copyright-protected works, but they are free to all. For example, in Miller v. Universal City Studios, an author who revealed important facts about a famous kidnapping case in his deeply researched book could not use copyright to prevent a movie studio from incorporating those facts in its film based on the true story, even if discovering the facts took lots of hard work by the author.

In that context, the unprotected facts were taken from the book by a human reader, who then incorporated them into the film’s screenplay. For an AI “author” to learn and incorporate facts from previous books into its new work, it would first have to literally copy the entire book where the facts are found—including not only the unprotected facts but the protected expression, too. Should that make a difference?

No. Here again, as in the case of genre and style, fair use would enable the literal copying of protected text in order to discover and reuse the unprotected facts. We know this because this was exactly what happened in the Google Books case. Google partnered with university libraries to digitize and analyze millions of books in their collections, using that data to reveal facts as simple as “Which books include references to Albert Einstein” or as complex as “When did books start referring to ‘the United States’ as a singular collective noun (‘the United States is’) rather than a plural (‘the United States are’)?” In that case, the court said that discovering these facts, and making them readily available to anyone, is a socially valuable activity that serves copyright’s core purpose, “promoting the progress of science,” and at the same time poses no unfair threat of competition to the works that are digitized.

Conclusion

So what have we learned? Copyright may protect authors in the first instance, but ultimately its role is to further the public good. Copyright regulates competition, but only in specific ways. Fair use is an essential bulwark against copyright literalism in the digital age. And finally, fair use can help technology to unlock free aspects of protected works. I can’t say, yet, whether I welcome our new robot overlords. I’m not even sure if they will be our overlords. But I have certainly appreciated the way that thinking about them has helped to sharpen my own thinking about copyright.