[Matthew Sag and Sean Flynn, IP Watch, Link (CC-BY-NC-SA)] This week, the South African Parliament began accepting comments on its pending Bill proposing to amend the South African Copyright Act to align it with the digital age. We and other experts and civil society organizations submitted comments praising many of the Bill’s provisions and proposing that it adopt an “open” fair use right. Here we focus on one major reason to adopt an open fair use right – to authorize so-called non-expressive uses of works. We conclude with some reflectio ns on how international law could help in this regard.

Analogue Law in a Digital World

In the era of the printing press that gave birth to modern copyright law, making a copy of a work was a distinct activity with a well-settled meaning. Every new instantiation of a work in a physical copy made that book available to a new consumer or a new group of consumers. The exclusive right to make and sell copies made sense in this context; it created an economic system whereby copyright owners had a clear and distinct tolling point for remuneration.

In the digital age, a large and growing number of technologies rely on intermediate copies that have no independent economic significance and do not communicate the author’s original expression to the public. These new and important Internet uses include machine learning, cloud computing, text mining, plagiarism detection, automated detection of copyright infringement and constructing search engine indexes. The copying at the heart of these technologies is “non-expressive use” (sometimes also referred to “non-consumptive use”). Specifically, the term “non-expressive use” refers to the making of intermediate copies of copyrighted works as part of an analytical process that does not communicate the work’s original expression to any human end user.[1]

Exceptions that allow only for quotation of excerpts, or that are confined to traditional purposes like research, criticism or study, or apply only to particular users, like schools and libraries, do not authorize the Internet as we know it. Non-expressive uses are fair by any definition – they do not take markets away from copyright owners, indeed they often create new markets for works (e.g. in the case of indexing that refers users to works). But under current South African law and the law of many countries around the world, these Internet uses are arguably unlawful because, although they do not communicate the copyright owner’s original expression to the public in any way, they all rely on copying as an intermediate technical step.

Non-expressive uses rely on the ability of machines to read thousands (sometimes millions) of works to abstract metadata from those works. The metadata itself is fundamentally different from the original expression contained within the primary works, just as would be a researcher’s notes on her reading. The metadata is fact not expression. It is not similar to the primary works and the creators of the primary works do not author it.[2] Every copyright system around the world would recognize that such metadata is composed of unprotectable facts. However, there are still many countries where copyright law effectively makes it illegal to generate this kind of metadata using a computer for the simple reason that when computers “read” they also copy.

Non-expressive uses have enormous potential to advance human progress without prejudicing the interests of authors or copyright owners. To illustrate, researchers using text mining do not copy works to read them individually; they copy them by the thousands to generate abstract metadata about entire collections of works. This is true whether the works are blog posts, library books, or webpages. As the Second Circuit explained in the recent case of Authors Guild v. Google, Inc.,

Google’s “ngrams” research tool draws on the Google Library Project corpus to furnish statistical information to Internet users about the frequency of word and phrase usage over centuries. This tool permits users to discern fluctuations of interest in a particular subject over time and space by showing increases and decreases in the frequency of reference and usage in different periods and different linguistic regions. It also allows researchers to comb over the tens of millions of books Google has scanned in order to examine word frequencies, syntactic patterns, and thematic markers and to derive information on how nomenclature, linguistic usage, and literary style have changed over time.[3]

Beyond identifying patterns in vast libraries of literature, text mining has enabled researchers to identify new treatments for diseases by observing correlations in scientific papers that were not apparent to any single researcher. Text mining is vital for machine learning, automatic translation, and developing the language models that power dictation software.

Allowing non-expressive use is consistent with the goals of copyright. Copyright law is not an end unto itself, it was established to promote human progress by motivating and rewarding the creation of new and original expression. Thus, the law distinguishes between facts and ideas (unprotectable) and expression (protectable). A work is only regarded as having been copied when a substantial part of its original expression has been reproduced. If the purpose of copyright is to protect original expression, it stands to reason that non-expressive use should not infringe copyright.

Opening Exceptions to the Internet

South Africa, like many other countries around the world, has a “closed list” of exceptions. In order to be found to be a protected unlicensed use of a work, the purpose of the use must be listed in the Act’s exceptions. In the United States and an increasing number of other countries around the world (e.g. Singapore, Israel, Korea, Malaysia, Philippines and others), there is in addition to specific exceptions a general exception that is open to application to any purpose.

South Africa has a general exception that authorizes a “fair dealing” with a work, but is not open. South Africa’s fair sealing standard applies only to uses for the purposes of research or private study, personal or private use, criticism or review, and reporting current events.

The problem with South Africa’s general exception is not that is called “fair dealing” instead of “fair use.” The terms “fair dealing” and “fair use” have the same legal meaning. They indicate that any kind of action with a work (any “dealing” or “use”) is potentially within the scope of the exception. The problem with South Africa’s general exception is that is not open.

The non-expressive use of copyrighted works does not fit neatly into the categories of fair dealing authorized in South Africa or any other closed general exception we know of. The production and use of metadata is not exactly criticism and it is not always research or scholarship. Computational analysis may be vital to news reporting on current events, but as an intermediate step it does not clearly amount to news reporting as such.

Calling an exception “fair use” does not make it open to purposes such as non-expressive uses. Some laws that authorize “fair use” have a closed list of permitted purposes (e.g. Uganda); others that authorize “fair dealing” have an open list of permitted purposes (e.g. Singapore, Malaysia).

The magic words in the US fair use clause are “including” and “such as” not “fair use.” The right provides, in relevant part:

[T]he fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright.

The openness of the US fair use clause to any kind of use (“including such use by reproduction”) and to any kind purpose (“such as criticism, comment,” etc.) is often referred to as the secret sauce of the US innovation-enabling environment. Unlike in closed list systems, open fair use gives technology developers the ability – if challenged – to justify their particular use of a copyrighted work as for a purpose that promotes the goals of copyright, as reasonable in light of that purpose, and as unlikely to harm the interests of copyright owners. The list of innovations enabled by the ability to make such arguments is long – including the videocassette recorder, cloud storage and many of the non-expressive Internet uses we discuss above.

Practical reforms

South Africa could make suitable provision for non-expressive use by simply adding the words “such as” before the list of authorized purposes in its existing fair dealing clause. For example, South Africa’s proposed general exception in Section 12 of the Act could be amended to read:

“In addition to uses specifically authorised, a fair dealing or use with respect to a work or performance for purposes such as the following does not infringe copyright in that work: . . .”

As an alternative to an open fair dealing right, or as a clarification thereof, South Africa’s and other countries’ copyright law could be amended with a specific provision to protect modern Internet uses. E.g.:

Any use of a work that is merely an intermediate technological step in the production of metadata that does not itself embody and is not capable of communicating a copyright owner’s original expression, does not infringe the exclusive rights of the author or the copyright owner of that work under this Act.

Toward an International Digital Economy Right

There are lessons here for international law as well. Recent trade and international agreements have been developing language to protect and promote copyright limitations and exceptions that are needed in today’s world.

Some recent trade and international agreements contain provisions that seek to protect the rights of countries to use open general exceptions in their laws, clarifying that such rights are compliant with the so-called “three step” test. Perhaps the first of such provisions was included in the US-Korea Free Trade Agreement:

Article 18.4: Copyright and Related Rights

FN 11. Each Party shall confine limitations or exceptions to the rights described in paragraph 1 to certain special cases that do not conflict with a normal exploitation of the work, performance, or phonogram, and do not unreasonably prejudice the legitimate interests of the right holder. For greater certainty, each Party may adopt or maintain limitations or exceptions to the rights described in paragraph 1 for fair use, as long as any such limitation or exception is confined as stated in the previous sentence.

A similar provision was included in Article 10 of the Marrakesh Treaty to Facilitate Access to Published Works for Persons Who Are Blind, Visually Impaired or Otherwise Print Disabled. Article 10 of that agreement clarified that measures to provide exceptions for people with disabilities “may include judicial, administrative or regulatory determinations for the benefit of beneficiary persons as to fair practices, dealings or uses.” These provisions seek to make clear that open general exceptions are an adequate way to promote balance in copyright laws.

Other international laws seek to affirmatively require limitations and exceptions for the digital environment. The most prominent and detailed of these is Article 5 of the European Union Infosoc Directive. The Infosoc has one mandatory exception, requiring EU members to protect rights to make

temporary acts of reproduction . . . which are transient or incidental [and] an integral and essential part of a technological process and whose sole purpose is to enable:

(a) a transmission in a network between third parties by an intermediary, or

(b) a lawful use

The Korea and EU provisions are a good start toward a model for protecting the digital environment. They point toward a model for mandatory and permissive provisions, both of which would be useful in protecting Internet uses. But each provision is incomplete.

Protections of “fair use”, undefined, make little sense since there is no practical difference between using the terms “fair use” or “fair dealing” in a copyright exception. What we need is protection for open exceptions, which some claim violate the three step test. Here, the Max Planck Declaration: A Balanced Interpretation of the “Three-Step Test” in Copyright Law does a far better job, providing:

  1. The Three-Step Test’s restriction of limitations and exceptions to exclusive rights to certain special cases does not prevent (a) legislatures from introducing open ended limitations and exceptions, so long as the scope of such limitations and exceptions is reasonably foreseeable;

Requiring protections only for “temporary” copies also fails to fully protect non-expressive uses. Many of the examples of non-expressive uses that the Internet is based on – such as those made to create indexes for Internet search or data mining — are not temporary. The point should not be how long the copy lasts but rather its intermediate and non-expressive nature.

We would therefore propose the following basis for a discussion of an international rights framework for the digital economy, with one mandatory and one permissive provision:

Countries shall provide that any use of a work that is merely an intermediate technological step in the production of metadata that does not itself embody and is not capable of communicating a copyright owner’s original expression, or is an intermediate step in the to facilitate transmission in a network between third parties by an intermediary, does not infringe the exclusive rights of the author or the copyright owner of that work under this Act.

Countries may provide general exceptions that are open to application to any purpose, provided that such exceptions protect against unreasonable prejudice to the legitimate interests of the author, for example through substitution for the protected work in the market.

These two simple provisions would go a long way toward pointing countries around the world toward how to best ensure that non-expressive uses that undergird expression and commerce on the Internet should be protected from anachronistic copyright laws.


[1] See Matthew Sag, Copyright and Copy-Reliant Technology, 103 Northwestern University Law Review 1607–1682 (2009); Matthew Sag, Orphan Works as Grist for the Data Mill, 27 Berkeley Technology Law Journal 1503 – 1550 (2012); Matthew Jockers, Matthew Sag & Jason Schultz, Digital Archives: Don’t Let Copyright Block Data Mining, 490 Nature 29-30 (October 4, 2012).

[2] Id. See also Matthew Jockers, Matthew Sag & Jason Schultz, Brief of Digital Humanities and Law Scholars in Support of Defendants-Appellees and Affirmance in Authors Guild v. Google(13-4829) (July 10, 2014).

[3] See Authors Guild v. Google, Inc., 804 F.3d 202, 209 (2d Cir. 2015) (internal citations and quotations omitted)