This bibliography includes academic literature on copyright and the right to research. Some of the papers focus specifically on copyright and artificial intelligence or text- and data-mining. It includes articles from legal academics, empiricists and some practitioners. You can use the search field below to filter by author or topic, or you can sort by any of the columns.

Please send suggested additions to Mike Palmedo – mpalmedo@american.edu.

Author(s)TitleYearAbstractCitationFull Text
Senftleben, Martin; with Thomas Margoni, Daniel Antal, Balázs Bodó, Stef van Gompel, Christian Handke, Martin Kretschmer, Joost Poort, João Quintais, and Sebastian Felix SchwemerEnsuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies2021In the European Strategy for Data, the European Commission highlighted the EU’s ambition to acquire a leading role in the data economy. At the same time, the Commission conceded that the EU would have to increase its pools of quality data available for use and re-use. In the creative industries, this need for enhanced data quality and interoperability is particularly strong. Without data improvement, unprecedented opportunities for monetising the wide variety of EU creative and making this content available for new technologies, such as artificial intelligence training systems, will most probably be lost. The problem has a worldwide dimension. While the US have already taken steps to provide an integrated data space for music as of 1 January 2021, the EU is facing major obstacles not only in the field of music but also in other creative industry sectors. Weighing costs and benefits, there can be little doubt that new data improvement initiatives and sufficient investment in a better copyright data infrastructure should play a central role in EU copyright policy. A trade-off between data harmonisation and interoperability on the one hand, and transparency and accountability of content recommender systems on the other, could pave the way for successful new initiatives.Martin Senftleben, Thomas Margoni, Daniel Antal, Balázs Bodó, Stef van Gompel, Christian Handke, Martin Kretschmer, Joost Poort, João Quintais, and Sebastian Felix Schwemer. Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies (February 12, 2021). Link
Craig, CarysAI and Copyright2020This chapter examines the most pertinent issues facing copyright law as it encounters increasingly sophisticated artificial intelligence (AI). It begins with a few introductory examples to illuminate the potential interactions of AI and copyright law. Section 2 then tackles the question of whether AI-generated works are copyrightable in Canada and who, if anyone, might own that copyright. This involves a doctrinal discussion of “originality” (the threshold for copyrightability) as well as reflections on the meaning of “authorship,” and concludes with the suggestion that autonomously generated AI outputs presently (and rightly) belong in the public domain. Section 3 turns to consider issues of copyright infringement. First, it addresses the law in respect of AI inputs (the texts and data used to train AI systems, which may themselves be copyrightable works) and highlights the need for greater limits and exceptions to ensure that copyright law does not obstruct best practices in the development and implementation of AI technologies. It then examines the matter of potentially infringing AI outputs (which may, of course, resemble copyright-protected, human-created works), identifying current uncertainties around independent creation, agency, and the allocation of liability. Section 4 addresses the deployment of AI in automated copyright-enforcement, emphasizing its increasingly critical role in shaping our online environment and citizens’ everyday encounters with copyright enclosures. The chapter concludes with reflections on the risks and opportunities presented by AI in the copyright context, and identifies key gaps and questions that remain to be answered as copyright law and policy adjust to evolving AI technologies.Craig, Carys J., AI and Copyright (November 2, 2020). in Florian Martin-Bariteau & Teresa Scassa, eds., Artificial Intelligence and the Law in Canada (Toronto: LexisNexis Canada, 2021)Link
Elkin-Koren, Niva; with Neil Weinstock NetanelTransplanting Fair Use across the Globe: A Case Study Testing the Credibility of U.S. Opposition2020The fair use privilege of United States copyright law long stood virtually alone among national copyright laws in providing a flexible, open-ended copyright exception. Most countries’ copyright statutes set out a list of narrowly defined exceptions to copyright owners’ exclusive rights. By contract, U.S. fair use doctrine empowers courts to carve out an exception for an otherwise infringing use after weighing a set of equitable factors on a case-by-case basis.
In the face of rapid technological change in cultural production and distribution, however, the last couple decades have witnessed widespread interest in adopting fair use in other countries. Thus far, the fair use model has been adopted in a dozen countries and considered by copyright law revision commissions in several others. Yet, ironically, U.S. copyright industries – motion picture studios, record labels, music publishers, and print publishers – and, in some instances, U.S. government representatives have steadfastly opposed the transplanting of U.S. fair use to other countries. They argue, principally, that, while fair use works reasonably well in the U.S., foreign courts that lack the 150 years of U.S. fair use precedent would likely apply the fair use exception in a chaotic, libertine manner, thus seriously undermining copyright protection.
This Article tests the credibility of that blanket U.S. opposition. In so doing, we present the first comprehensive study of how courts have actually applied fair use in a country outside the United States. We report the results of our study of the first decade of fair use case law in Israel, which enacted a fair use exception as part of its copyright law revision in 2007. We also compare Israeli fair use doctrine with that of the United States, drawing on parallel empirical studies of U.S fair use case law.
Our study plausibly supports two general conclusions of relevance to the global debate about fair use. First, our findings counter the sweeping claim, advanced by fair use opponents, that the adoption of fair use outside the United States will inevitably open the floodgates to massive uncompensated copying and dissemination of authors’ creative expression. We find that, in fact, Israeli courts have been far less receptive to fair use defenses than have U.S. courts. Far from seeing fair use as a “free ticket to copy,” Israeli courts actually ruled against fair use at a far greater rate than did their American counterparts during the ten-year period of our study.
Second, our case study suggests that in one respect U.S. copyright industries raise a valid point: local courts will, indeed, develop distinct versions of fair use doctrine in line with their local jurisprudence and national policies.
Elkin-Koren, Niva and Netanel, Neil Weinstock, Transplanting Fair Use across the Globe: A Case Study Testing the Credibility of U.S. Opposition (May 11, 2020). Hastings Law Journal, Forthcoming, UCLA School of Law, Public Law Research Paper No. 20-15Link
Flynn, Sean; with Christophe Geiger, João Pedro Quintais, Thomas Margoni, Matthew Sag, Lucie Guibault, and Michael W. CarrollImplementing User Rights for Research in the Field of Artificial Intelligence: A Call for International Action2020Last year, before the onset of a global pandemic highlighted the critical and urgent need for technology-enabled scientific research, the World Intellectual Property Organization (WIPO) launched an inquiry into issues at the intersection of intellectual property (IP) and artificial intelligence (AI). We contributed comments to that inquiry, with a focus on the application of copyright to the use of text and data mining (TDM) technology. This article describes some of the most salient points of our submission and concludes by stressing the need for international leadership on this important topic. WIPO could help fill the current gap on international leadership, including by providing guidance on the diverse mechanisms that countries may use to authorize TDM research and serving as a forum for the adoption of rules permitting cross-border TDM projects.
Sean Flynn, Christophe Geiger, João Pedro Quintais, Thomas Margoni, Matthew Sag, Lucie Guibault, and Michael W. Carroll. Implementing User Rights for Research in the Field of Artificial Intelligence: A Call for International Action, 7 Eur. Intell. Prop. Rev. (2020)Link
Giblin, RebeccaAre Contracts Enough? An Empirical Study of Author Rights in Australian Publishing Agreements2020A majority of the world’s nations grant authors statutory reversion rights: entitlements to reclaim their copyrights in certain circumstances, such as their works becoming unavailable for purchase. In Australia (as in the United Kingdom) we have no such universal protections, leaving creator rights to be governed entirely by their contracts with investors. But is this enough? We investigate that question in the book industry context via an exploratory study of publishing contracts sourced from the archive of the Australian Society of Authors. We identify serious deficiencies in the agreements generally as well as the specific provisions for returning rights to authors. Many contracts were inconsistent or otherwise poorly drafted, key terms were commonly missing altogether, and we demonstrate that critical terms evolved very slowly in response to changed industry realities. In response to this new evidence we propose that consideration be given to introducing baseline minimum protections with the aim of improving author incomes, investment opportunities for publishers and access for the public.Rebecca Giblin, Are Contracts Enough? An Empirical Study of Author Rights in Australian Publishing Agreements, 44 Melbourne U. L. Rev., 1 (2020), Link
Lemley, Mark; with Bryan CaseyFair Learning2020Neural network and machine learning artificial intelligences (AIs) need comprehensive data sets to train on. Those data sets will often be composed of images, videos, audio, or text. All those things are copyrighted. Copyright law thus stands as an enormous potential obstacle to training AIs. Not only might the aggregate data sets themselves be copyrighted, but each individual image, video, and text in the data set is likely to be copyrighted too.

It’s not clear that the use of these databases of copyrighted works to build self-driving cars, or to learn natural languages by analyzing the content in them, will be treated as a fair use under current law. Fair use doctrine in the last quarter century has focused on the transformation of the copyrighted work. AIs aren’t transforming the databases they train on; they are using the entire database, and for a commercial purpose at that. Courts may view that as a kind of free riding they should prohibit.

In this Article, we argue that AIs should generally be able to use databases for training whether or not the contents of that database are copyrighted. There are good policy reasons to do so. And because training data sets are likely to contain millions of different works with thousands of different owners, there is no plausible option simply to license all the underlying photographs or texts for the new use. So allowing a copyright claim is tantamount to saying, not that copyright owners will get paid, but that no one will get the benefit of this new use.

There is another, deeper reason to permit such uses, one that has implications far beyond training AIs. Understanding why the use of copyrighted works by AIs should be fair actually reveals a significant issue at the heart of copyright law. Sometimes people (or machines) copy expression but they are only interested in learning the ideas conveyed by that expression. That’s what is going on with training data in most cases. The AI wants photos of stop signs so it can learn to recognize stop signs, not because of whatever artistic choices you made in lighting or composing your photo. Similarly, it wants to see what you wrote to learn how words are sequenced in ordinary conversation, not because your prose is particularly expressive.

AIs are not alone in wanting just the facts. The issue arises in lots of other contexts. In American Geophysical Union v. Texaco, for example, the defendants were interested only in the ideas in scientific journal articles; photocopying the article was simply the most convenient way of gaining access to those ideas. Other examples include copyright disputes over software interoperability cases like Google v. Oracle, current disputes over copyright in state statutes and rules adopted into law, and perhaps even Bikram yoga poses and the tangled morass of cases around copyright protection for the artistic aspects of utilitarian works like clothing and bike racks. In all of these cases, copyright law is being used to target defendants who actually want something the law is not supposed to protect – the underlying ideas, facts, or functions of the work.

Copyright law should permit copying of works for non-expressive purposes. When the defendant copies a work for reasons other than to have access to the protectable expression in that work, fair use should consider under both factors one and two whether the purpose of the defendant’s copying was to appropriate the plaintiff’s expression or just the ideas. We don’t want to allow the copyright on the creative pieces to end up controlling the unprotectable elements.
Lemley, Mark A. and Casey, Bryan, Fair Learning (January 30, 2020). Link
Margoni, ThomasText and Data Mining in Intellectual Property Law: Towards an Autonomous classification of Computational Legal Methods2020Text and Data Mining (TDM) can generally be defined as the “process of deriving high-quality information from text and data” and commonly refers to a set of automated analytical tools able to extract new, often hidden, knowledge from existing information. The impact that TDM may have on science, arts and humanities is invaluable. This is because by identifying the correlations and patterns that are often concealed to the eye of a human observer due to the amount, complexity, or variety of data surveyed, TDM allows for the discovery of concepts or the formulation of correlations that would have otherwise remained concealed or undiscovered. Considering this point of view, it can be effectively argued that TDM creates new knowledge from old data.
The first part of this paper illustrates the state of the art in the still nascent field of TDM and related technologies applied to IP and legal research more generally. Furthermore, it formulates some proposals of systematic classification in a field that suffers from a degree of terminological vagueness. In particular, this paper argues that TDM, together with other types of data-driven analytical tools, deserves its own autonomous methodological classification as ‘computational legal methods.’ The second part of the chapter paper offers concrete examples of the application of computational approaches in IP legal research. This is achieved by discussing a recent project on TDM, which required the development of different methods in order to address certain problems that emerged during the implementation phase. The discussion allows to take a detailed view of the technology involved, of the relevant skills that legal researchers necessitate, and of the obstacles that the application of TDM to IP research contributes to overcome. In particular, it demonstrates some of the advantages in terms of automation and predictive analysis that TDM enables. At the same time, the limited success of the experiment also shows that there are a number of training and skill-related issues that legal researchers and practitioners interested in the application of TDM and computational legal methods in the field of IP law should consider. Accordingly, the second argument advanced in this chapter paper is that law school programmes should include in their educational offer training on computational legal methods.
Thomas Margoni, Text and Data Mining in Intellectual Property Law: Towards an Autonomous classification of Computational Legal Methods, CREATe working paper [TBC]/2020, forthcoming in Calboli I. & Montagnani L., Handbook on Intellectual Property Research, OUP, 2020 Link
Quintais, JoãoThe New Copyright in the Digital Single Market Directive: A Critical Look2020On 17 May 2019 the official version of the new Directive on copyright and related rights in the Digital Single Market was published. This marks the end of a controversial legislative process at EU level. It also marks the beginning of what will surely be a contentious process of national implementation. This article provides an overview and critical examination of the new Directive. It argues that what started as a legislative instrument to promote the digital single market turned into an industry policy tool, shaped more by effective lobbying than evidence and expertise. The result is a flawed piece of legislation. Despite some positive aspects, the Directive includes multiple problematic provisions, including the controversial new right for press publishers and the new liability regime for content-sharing platforms. On balance, the Directive denotes a normative preference for private ordering over public choice in EU copyright law, and lacks adequate safeguards for users. It is also a complex text with multiple ambiguities, which will likely fail promote the desired harmonization and legal certainty in this area. João Pedro Quintais, The New Copyright in the Digital Single Market Directive: A Critical Look, Eur. Intell. Prop. Rev., 1 (2020)Link
Butler, Brandon; with Prudence Adler and Krista CoxThe Law and Accessible Texts: Reconciling Civil Rights and Copyrights2019This report is written to inform the participants in a new collaborative project to improve how accessible texts (i.e., texts in formats that meet the needs of users with disabilities)1 are created, managed, and stored. It provides a concise, up-to-date summary of the two key legal pressures that bear on the creation and sharing of accessible texts: the civil rights laws that require creation and distribution of accessible texts by IHEs to ensure equitable access to information, and the copyright laws that are sometimes (as we will show) misperceived as barriers to that effort. Concern that these legal regimes may be in tension contributes to inefficiency in making and sharing accessible texts. Reconciling the mandates of copyright and civil rights clears the way for dramatic improvements in service that both vindicate civil rights and serve the First Amendment values that animate copyrightBrandon Butler, Prue Adler, and Krista Cox, The Law and Accessible Texts: Reconciling Civil Rights and Copyrights, Ass’n of Research Libraries (2019)Link
Butler, Brandon; with Patricia Aufderheide, Peter Jaszi, and Krista CoxCracking the Copyright Dilemma in Software Preservation: Protecting Digital Culture Through Fair Use Consensus2019Copyright problems may inhibit the crucially important work of preserving legacy software. Such software is worthy of study in its own right because it is critical to accessing digital culture and expression. Preservation work is essential for communicating across boundaries of the past and present in a digital era. Software preservationists in the United States have addressed their copyright problems by developing a code of best practices in employing fair use. Their work is an example of how collective action by users of law changes the norms and beliefs about law, which can in turn change the law itself insofar as the law takes account of community norms and practices. The work of creating the code involved facilitators who are communication, information sciences, and legal scholars and practitioners. Thus, the creation of the code is also an example of crossing the boundaries between technology and policy research.Butler, Brandon and Aufderheide, Patricia and Jaszi, Peter A. and Cox, Krista L., “Cracking the Copyright Dilemma in Software Preservation: Protecting Digital Culture Through Fair Use Consensus," Journal of Copyright in Education and Librarianship, Volume 3, Issue 3, 2019,Link
Carroll, MichaelCopyright and the Progress of Science: Why Text and Data Mining is Lawful2019This Article argues that U.S. copyright law provides a competitive advantage in the global race for innovation policy because it permits researchers to conduct computational analysis - text and data mining - on any materials to which they have access. Amendments to copyright law in Japan, and the European Union’s recent addition of limitations on copyright to legalize some TDM research, implicitly acknowledge the competitive benefits provided by the fair use provision of U.S. copyright law.

Focusing only on U.S. law, this Article makes two general contributions to the literature on fair use: (1) in cases involving archiving, the user’s security precautions are relevant under the first fair use factor and should not be treated as an unenumerated factor or as part of the market harm analysis; and (2) good faith should not be a factor in fair use analysis, but even if courts do consider good faith, TDM research conducted on infringing sources, such as Sci-Hub, is still lawful because the research provides transformative benefits without causing harm to the markets that matter. This Article also revisits the issue of temporary copies to argue that certain steps in TDM research do not make copies that “count” under U.S. law and that it is possible to design cloud-based TDM research that does not implicate U.S. copyright law at all. This Article addresses the needs of many audiences including policymakers, courts, university counsel, research libraries, and legal scholars who seek a thorough legal analysis to support this argument.
Michael W. Carroll, Copyright and the Progress of Science: Why Text and Data Mining is Lawful, 53 U.C. Davis L. Rev. 893  (2019)Link
Craig, Carys; with Ian KerrThe Death of the AI Author2019Much of the second-generation literature on AI and authorship asks whether an increasing sophistication and independence of generative code should cause us to rethink embedded assumptions about the meaning of authorship, arguing that recognizing the authored nature of AI-generated works may require a less profound doctrinal leap than has historically been suggested. In this essay, we argue that the threshold for authorship does not depend on the evolution or state of the art in AI or robotics. Instead, we contend that the very notion of AI-authorship rests on a category mistake: it is not an error about the current or potential capacities, capabilities, intelligence or sophistication of machines; rather it is an error about the ontology of authorship.
Building on the established critique of the romantic author figure, we argue that the death of the romantic author also and equally entails the death of the AI author. We provide a theoretical account of authorship that demonstrates why claims of AI authorship do not make sense in terms of 'the realities of the world in which the problem exists.' (Samuelson, 1985) Those realities, we argue, must push us past bare doctrinal or utilitarian considerations of originality, assessed in terms of what an author must do. Instead, what they demand is an ontological consideration of what an author must be. The ontological question, we suggest, requires an account of authorship that is relational; it necessitates a vision of authorship as a dialogic and communicative act that is inherently social, with the cultivation of selfhood and social relations as the entire point of the practice. Of course, this ontological inquiry into the plausibility of AI-authorship transcends copyright law and its particular doctrinal conundrums, going to the normative core of how law should — and should not — think about robots and AI, and their role in human relations.
Craig, Carys J. and Kerr, Ian R., The Death of the AI Author (March 25, 2019). Osgoode Legal Studies Research Paper,Link
Elkin-Koren, NivaThe Chilling of Governance-by-Data on Data Markets2019Big data has become an important resource not only for commerce but also for governance. Governance-by-data seeks to take advantage of the bulk of data collected by private firms to make law enforcement more efficient. It can take many forms, including setting enforcement priorities, affecting methods of proof, and even chang- ing the content of legal norms. For instance, car manufacturers can use real-time data on the driving habits of drivers to learn how their cars respond to different driving patterns. If shared with the government, the same data can be used to en- force speed limits or even to craft personalized speed limits for each driver.
The sharing of data for the purpose of law enforcement raises obvious concerns for civil liberties. Indeed, over the past two decades, scholars have focused on the risks arising from such data sharing for privacy and freedom. So far, however, the literature has generally overlooked the implications of such dual use of data for data markets and data-driven innovation.
In this Essay, we argue that governance-by-data may create chilling effects that could distort data collection and data-driven innovation. We challenge the as- sumptions that incentives to collect data are a given and that firms will continue to collect data notwithstanding governmental access to such data. We show that, in some instances, an inverse relationship exists between incentives for collecting data and sharing it for the purpose of governance. Moreover, the incentives of data sub- jects to allow the collection of data by private entities might also change, thereby potentially affecting the efficiency of data-driven markets and, subsequently, data-driven innovation. As a result, data markets might not provide sufficient and adequate data to support digital governance. This, in turn, might significantly affect welfare.
Niva Elkin Koren, The Chilling of Governance-by-Data on Data Markets, 86 U. Chi. L. Rev. 403 (2019)Link
Geiger, Christophe; with Giancarlo Frosio and Oleksandr Bulayenko Text and Data Mining: Articles 3 and 4 of the Directive 2019/790/EU2019Our society is in the midst of an explosion of data: ‘there was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing’. In 2014, there were 2.4 billion internet users. That number grew to 3.8 billion in 2017 and new data is created by the quintillions of bytes every day. Together with mobile devices, the Internet of Things (IoT) contributes to this huge data production. In the big data era, orientating within this magma of online data has become an extremely complex but crucial task, also leading to complex issues in terms of regulation of this new environment. The European Union seems at first to have acknowledged the potential of monitoring data, putting in place measures to unlock TDM potentialities. On 14 September 2016, the European Commission published a Proposal for a Directive on copyright in the Digital Single Market, which was approved into Directive 2019/790/EU on 17 April 2019 (“DSM Directive”). Inter alia, this copyright reform would like to improve access to protected works across borders within the Digital Single Market (DSM) to boost research and innovation. To this end, the DSM Directive includes a set of new mandatory exceptions and limitations. In particular, the reform introduces two specific limitations for TDM. In this chapter, the introduction of mandatory TDM limitations in European law will be assessed against the international and European framework of copyright exceptions and limitations by considering the rationales for such an exception and the positive and negative impacts of the reform. Giving the importance of TDM activities for the economic development in the EU and its innovative environment, the question arises if the reform lives up to the expectations. Although, following our previous suggestions, the scope of the limitation has been broadened, the final text of the reform still limits the full exploitation of the potential of data for research and innovation, for start-ups as well as more generally for the broader access to works and the information they contain.Geiger, Christophe and Frosio, Giancarlo and Bulayenko, Oleksandr, Text and Data Mining: Articles 3 and 4 of the Directive 2019/790/EU (October 17, 2019). Concepción Saiz García and Raquel Evangelio Llorca (eds.), "Propiedad intelectual y mercado único digital europeo", Valencia,Tirant lo blanch, 2019, pp. 27-71., Centre for International Intellectual Property Studies (CEIPI) Research Paper No. 2019-08,Link
Palmedo, MichaelThe Impact of Copyright Exceptions for Researchers on Scholarly Output2019High prices restrict access to academic journals and books that scholars rely upon to author new research. One possible solution is the expansion of copyright exceptions allowing unauthorized access to copyrighted works for researchers. I test the link between copyright exceptions for health and science researchers and their publishing output at the country-subject level. I find that scientists residing in countries that implement more robust research exceptions publish more papers and books in subsequent years. This relationship between copyright exceptions and publishing is stronger in lower-income countries, and stronger where there is stricter copyright protection of existing works.Mike Palmedo, The Impact of Copyright Exceptions for Researchers on Scholarly Output, Efil Journal of Economic Research, 2(6), 114-39. (2019),Link
Rai, ArtiMachine Learning at the Patent Office: Lessons for Patents and Administrative Law2019The empirical data indicate that a relatively small increment of additional USPTO investment in prior art search at the initial examination stage could be a cost-effective mechanism for improving accuracy in the patent system. This contribution argues that machine learning provides a promising arena for such investment. Notably, the use of machine learning in patent examination does not raise the same potent concerns about individual rights and discrimination that it raises in other areas of administrative and judicial process. That said, even an apparently easy case like prior art search at the USPTO poses challenges. The most important generalizable challenge relates to explainability. The USPTO has stressed transparency to the general public as necessary for achieving adequate explainability. However, at least in contexts like prior art search, adequate explainability does not require full transparency. Moreover, full transparency would chill provision of private sector expertise and would be susceptible to gaming. Arti Kaur Rai, Machine Learning at the Patent Office: Lessons for Patents and Administrative Law, 104 Iowa L. Rev. 2617 (2019).Link
Sag, MatthewThe New Legal Landscape for Text Mining and Machine Learning2019Individually and collectively, copyrighted works have the potential to generate information that goes far beyond what their individual authors expressed or intended. Various methods of computational and statistical analysis of text — usually referred to as text data mining (“TDM”) or just text mining — can unlock that information. However, because almost every use of TDM involves making copies of the text to be mined, the legality of that copying has become a fraught issue in copyright law in United States and around the world. One of the most fundamental questions for copyright law in the Internet age is whether the protection of the author’s original expression should stand as an obstacle to the generation of insights about that expression. How this question is answered will have a profound influence on the future of research across the sciences and the humanities, and for the development of the next generation of information technology: machine learning and artificial intelligence.
This Article consolidates a theory of copyright law should that I have advanced in a series of articles and amicus briefs over the past decade. It explains why applying copyright’s fundamental principles in the context of new technologies necessarily implies that copying expressive works for non-expressive purposes should not be counted as infringement and must be recognized as fair use. The Article shows how that theory was adopted and applied in the recent high-profile test cases, Authors Guild v. HathiTrust and Authors Guild v. Google, and takes stock of the legal context for TDM research in the United States in the aftermath of those decisions.
The Article makes important contributions to copyright theory, but is also integrates that theory with a practical assessment various interrelated legal issues that text mining researchers and their supporting institutions must confront if they are to realize the full potential of these technologies. These issues range from the enforceability of website terms of service, the effect of laws prohibiting computer hacking and the circumvention of technological protection measures (i.e., encryption and other digital locks), and cross-border copyright issues.
Matthew Sag, The New Legal Landscape for Text Mining and Machine Learning, 66 J. Copyright Soc’y USA, Feb. 2019, at 1–64Link
Samberg, Rachel; with Cody HennesyLaw and Literacy in Non-Consumptive Text Mining: Guiding Researchers Through the Landscape of Computational Text Analysis2019Imagine you are working with two digital humanities scholars studying post-WWII poetry, both of whom are utilizing a single group of copy- right-protected works. The first scholar has collected dozens of these poems to closely analyze artistic approach within a literary framework. The second has built a personal database of the poems to apply automat- ed techniques and statistical methods to identify patterns in the poems’ syntax. This latter methodology—in which previously unknown pat- terns, trends, or relationships are extracted from a collection of textual documents—is an example of “computational text analysis” (CTA),2 also commonly referred to as “text mining” or “text data mining.”
In accessing, building, and then working with these collections of texts (or “corpora” to use the jargon of the digital humanities), both scholars are exercising rights and making elections that carry legal impact. Indeed, they may not even be aware of the choices they can or must make:
From a copyright fair use perspective, does it matter whether a scholar compiles poems to read (or “consume”) or, like the CTA scholar above, uses algorithms to mine information within them (often referred to as “non-consumptive” analysis)?
How does an added layer of university database licensing, a pub- lisher-provided API (application programming interface), a uni- versity archives agreement, or a website’s “terms of use” fit into a CTA researcher’s protocol for content access, collection, and anal- ysis? When might conditions of those agreements or tools bear upon the researchers’ fair use rights?
And what should researchers know about whether they can subsequently share the corpus they use or create or republish excerpts from it in their scholarship?
Guiding scholars in addressing these issues before they build their research corpora can help them avoid unexpected pitfalls, particularly when a CTA scholar must grapple with unique copyright scenarios. …
Already, some guidance on the legal issues arising within CTA has been created for European Union researchers.4 Resources offering similar assistance under a US legal framework are just beginning to emerge.5 This chapter attempts to build upon such input in an effort to address CTA support from a researcher’s perspective. Here, we survey copyright and other legal terrain affecting CTA, exploring where these legal issues inter- sect with CTA methodologies to illuminate pain points for researchers. We then sketch a scholarly workflow that unites law and CTA practice—a roadmap meant to be both adoptable and adaptable by scholars in the field.
Rachael G. Samberg & Cody Hennesy, Law and Literacy in Non-Consumptive Text Mining: Guiding Researchers Through the Landscape of Computational Text Analysis, in Copyright Conversations: Rights Literacy in a Digital World (UC Berkeley 2019)Link
Association of Research LibrariesCode of Best Practices in Fair Use for Software Preservation2018This is a code of best practices in fair use, describing the ways that fair use can be useful to software preservation in common, recurring contexts.
Fair use is the right given in US copyright law to use copyrighted material without payment or permission, under some circumstances. A long pattern of judicial decisions applying Supreme Court precedent shows that an assessment of fair use typically depends on the answers to two questions:
Is the use transformative—is the purpose for which preexisting copyrighted material is reused different from that for which it was originally created?
Is the amount of material used appropriate to the purpose of the new use?
If so, it is likely that fair use applies. A fuller explanation of fair use law is in Appendix One.
This Code was made by and for the software preservation community, with the help of legal and technical experts. It provides librarians, archivists, curators, and others who work to preserve software with a tool to guide their reasoning about when and how to employ fair use, in the most common situations they currently face. It does not provide shortcuts in
the form of prescriptive “guidelines” or rules of thumb. Nor does it seek to address all the possible situations in which software preservation professionals might employ fair use, now or in the future.
Association of Research Libraries, Code of Best Practices in Fair Use for Software Preservation, (2018) Link
Butler, Brandon; with Patricia Aufderheide, Peter Jaszi, and Krista CoxThe Copyright Permissions Culture in Software Preservation and Its Implications for the Cultural Record2018A report released on February 9, 2018, The Copyright Permissions Culture in Software Preservation and Its Implications for the Cultural Record, finds that individuals and institutions need clear guidance on the legality of archiving legacy software to ensure continued access to digital files of all kinds and to illuminate the history of technology.
The first product of an Association of Research Libraries (ARL) project funded by the Alfred P. Sloan Foundation, the report is based on extensive research and interviews with software preservation experts and other stakeholders. This research will inform a Code of Best Practices in Fair Use for Software Preservation to be published in fall 2018, and to be supported by webinars, workshops, online discussions, and educational materials. The Code will advance the mission of memory institutions to safeguard the digital record and promote research that engages it.
Brandon Butler, Aufderheide, P., Jaszi, P., & Cox, K., The Copyright Permissions Culture in Software Preservation and Its Implications for the Cultural Record, Association of Research Libraries (2018)Link
Dusollier, SéverineRealigning Economic Rights with Exploitation of Works: The Control of Authors Over the Circulation of Works in the Public Sphere2018Economic rights in copyright have lost their meaning and their efficiency. Reduced to technical notions, the right of reproduction and the right of communication to the public are today applied to uses that seem harmless or ancillary, and fail to ensure the legitimate control by authors of the exploitation of their works. This paper proposes to reconstruct economic rights in copyright around the notion of exploitation, defined as the circulation of works in the public sphere. Firstly, it argues that the notion of exploitation used to be the guiding principle of the rights of reproduction and communication, that were only means to help such exploitation. A second part explores the increasing disconnection between economic rights and actual exploitation. In order to counter such disconnection, I propose replacing the current system by a broad and unique right of exploitation, related to the function of copyright, which is to grant to authors control over the public circulation of their works. Three types of exploitation of works, that each aims at transmitting the work, as a communicative act, to the public sphere, could constitute new anchor points for acts of use to be considered as entering the exclusive reservation of authors: (1) the provision of copies to the public for a permanent use, (2) the provision of access to or experiences of the work, and (3) the making of derivative works. Any use of a work, currently existing or to be developed, that would fall under one of these forms of exploitation could be controlled by the copyright owners or be compensated for. Personal uses, technical copies and mere uses of the informational content of the work should remain free.
Copyright should be about giving authors enough protection and autonomy to enable them to make that circulation possible in the first place and to give them some control over the dissemination of their works, while recognizing and encouraging public discussion and enjoyment of creation by the public. The realm of exclusivity copyright confers should be conceived as a set of entitlements to enjoy the value of the work, some reserved to authors, others offered to the public, seen equally as recipients, readers and follow-on creators, as a system distributing the enjoyment of creation and circulating it to enhance its protection.
Séverine Dusollier, Realigning Economic Rights with Exploitation of Works: The Control of Authors Over the Circulation of Works in the Public Sphere, in Copyright Reconstructed 163–203 (P. Bernt Hugenholtz ed., 2018)Link
Flynn, Sean; with Michael PalmedoThe User Rights Database: Measuring the Impact of Copyright Balance2018International and domestic copyright law reform around the world is increasingly focused on how copyright exceptions and other forms of “user rights” should be expanded to promote maximum innovation, creativity, and access to knowledge in the digital age. These efforts are guided by a relatively rich theoretical literature. However, few empirical studies explore the social and economic impact of expanding user rights in the digital era. One reason for this gap has been the absence of a tool measuring the key independent variable – changes in copyright user rights over time and between countries. We are developing such a tool, which we call the “User Rights Database.” This paper describes the methodology used to create the Database and the results of initial empirical tests using it. We find that all of the countries in our study are trending toward more “open” copyright user rights over time –their copyright laws allow a more unauthorized uses of copyrighted works. However, we find a development gap in the openness because the wealthy countries in our sample are about thirty years ahead of developing countries on this measure. Our empirical tests find positive relationships between more open user rights and innovative activities in information and communication technology industries, returns to firms in these industries, and the production of scholarly publications. We do not find evidence that opening user rights causes harm to revenue of copyright intensive industries like publishing and entertainment. Sean Flynn & Michael Palmedo, The User Rights Database: Measuring the Impact of Copyright Balance, Joint PIJIP/TLS Research Paper Series no. 2018-01. Link
Geiger, Christophe; with Giancarlo Frosio and Oleksandr Bulayenko The Exception for Text and Data Mining (TDM) in the Proposed Directive on Copyright in the Digital Single Market - Legal Aspects2018This research paper reproduces the study commissioned to CEIPI by the European Parliament’s Policy Department for Citizens’ Rights and Constitutional Affairs at the request of the Committee on Legal Affairs (JURI-Committee). It provides an analysis of the European Commission’s Proposal to introduce in Article 3 a mandatory exception to copyright allowing to carrying out text and data mining of protected works, assesses its positive and negative impacts and provides some suggestions for possible improvements. Advantages of introducing an “open clause” in EU copyright law on top of an enumerated list of limitations and exceptions to address some of the related problems are also reviewed.Christophe Geiger et al., The Exception for Text and Data Mining (TDM) in the Proposed Directive on Copyright in the Digital Single Market - Legal Aspects (Centre for International Intellectual Property Studies (CEIPI) Research Paper No. 2018-02)Link
Geiger, Christophe; with Giancarlo Frosio and Oleksandr Bulayenko Crafting a Text and Data Mining Exception for Machine Learning and Big Data in the Digital Single Market2018New data are created by the quintillions of bytes every day. This explosion of data makes possible fast-developing machine learning and artificial intelligence technology. These technologies thrive on repurposing and processing big data streams. In the big data era, orienting within this magma of online data has become an extremely complex but crucial task, leading to complex issues in terms of regulation of this new environment. According to the European Commission, the European data economy—also frequently referred to as the “fourth industrial revolution”—is a great opportunity for growth as “Big Data considerably improves decision-making capabilities and, ultimately organizational performances.” Text and data mining (TDM) thus serves as an essential tool to navigate the endless sea of online information in search of this invaluable treasure that big data might hold for the European economy. Some studies have estimated that it could create value in excess of hundreds of billions of euros for Europe if data can be used more effectively.
The European Union (EU) would like to promote measures to unlock TDM potentialities. The Proposal for a Directive on Copyright in the Digital Single Market (DSM Draft Directive) aims to improve access to protected works across borders within the digital single market (DSM) to boost research and innovation. In particular, the proposal would like to introduce a new mandatory limitation for TDM.
In this chapter we assess this proposal against the international and European copyright framework and evaluate room for possible improvement. We conclude by inviting EU policymakers to significantly broaden the scope of the limitation in order not to prevent European DSM players from engaging safely in ground-breaking technological innovation, such as machine learning, neural networks, and artificial intelligence, through the exploitation of big data’s riches.
Christophe Geiger, Giancarlo Frosio, and Oleksandr Bulayenko, Crafting a Text and Data Mining Exception for Machine Learning and Big Data in the Digital Single Market, X. Seuba, C. Geiger and J. Pénin (eds.), Intellectual Property and Digital Trade in the Age of Artificial Intelligence and Big Data, CEIPI/ ICTSD Series, Issue No. 5, 2018. Link
Geiger, Christophe; with Giancarlo Frosio and Oleksandr Bulayenko Text and Data Mining in the Proposed Copyright Reform: Making the EU Ready for an Age of Big Data?2018This opinion aims at examining the Text and Data Mining (TDM) process and its legal aspects in the context of the Commission’s proposal for a Directive on Copyright in the Digital Single Market, which introduces in its Art. 3 a mandatory exception to copyright allowing for the carrying out of text and data mining of protected works. The discussion starts with the examination of several critical questions. At which stage of the TDM process are intellectual property rights affected? Do already existing exceptions and limitations apply to some TDM activities and techniques? What are the problems faced by researchers in applying them? This paper then considers the potential of a new mandatory TDM exception to drive innovation in the EU. The advantages of introducing an “open clause” on top of an enumerated list of exceptions to address some of the related problems are also reviewed. The study provides an in-depth analysis of the Commission’s Proposal, assesses its positive and negative impacts, and provides some suggestions for possible improvements. It concludes by recommending a more ambitious reform with regard to TDM in order to get the EU into shape for the age of Big Data.Christophe Geiger, Giancarlo Frosio, Oleksandr Bulayenko , Text and Data Mining in the Proposed Copyright Reform: Making the EU Ready for an Age of Big Data?, 49(7) IIC Int’l Rev. Intellectual Prop. & Competition L. 814, 817 (2018)Link
Geiger, Christophe; with Giancarlo Frosio and Oleksandr Bulayenko The Exception for Text and Data Mining (TDM) in the Proposed Directive on Copyright in the Digital Single Market2018This in-depth analysis, commissioned by the European Parliament’s Policy Department for Citizens’ Rights and Constitutional Affairs at the request of the Committee on Legal Affairs (JURI-Committee), is a contribution to the workshop on "Text and data mining" held on 22 February 2018 in Brussels. It provides an analysis of the Commission’s Proposal (which introduces in Article 3 a mandatory exception to copyright allowing to carry out text and data mining of protected works), assesses its positive and negative impacts and provides some suggestions for possible improvements. Advantages of introducing an “open clause” on top of an enumerated list of exceptions to address some of the related problems are also reviewed. Christophe Geiger, Giancarlo Frosio, and Oleksandr Bulayenko, The Exception for Text and Data Mining (TDM) in the Proposed Directive on Copyright in the Digital Single Market - Legal Aspects, In-Depth Analysis for the Directorate-General for Internal Policies of the Union, Policy Department Citizens Rights and Constitutional Affairs, European Parliament, February 2018Link
Geist, MichaelWant to Keep Canadian AI Thriving?: Create a Copyright Exception for Informational Analysis2018Canada’s significant investment in AI needs a legal framework that ensure Canadian businesses and researchers are not placed at a global disadvantage. Whether by way of fair use provision or a more targeted informational analysis fair dealing exception, the government’s hopes for Canadian AI leadership is linked to AI-focused copyright reforms.Michael Geist, Want to Keep Canadian AI Thriving?: Create a Copyright Exception for Informational Analysis, Michael Geist (Oct. 18, 2018), Link
Levendowski, AmandaHow Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem2018As the use of artificial intelligence (AI) continues to spread, we have seen an increase in examples of AI systems reflecting or exacerbating societal bias, from racist facial recognition to sexist natural language processing. These biases threaten to overshadow AI’s technological gains and potential benefits. While legal and computer science scholars have analyzed many sources of bias, including the unexamined assumptions of its often-homogenous creators, flawed algorithms, and incomplete datasets, the role of the law itself has been largely ignored. Yet just as code and culture play significant roles in how AI agents learn about and act in the world, so too do the laws that govern them. This Article is the first to examine perhaps the most powerful law impacting AI bias: copyright.
Artificial intelligence often learns to “think” by reading, viewing, and listening to copies of human works. This Article first explores the problem of bias through the lens of copyright doctrine, looking at how the law’s exclusion of access to certain copyrighted source materials may create or promote biased AI systems. Copyright law limits bias mitigation techniques, such as testing AI through reverse engineering, algorithmic accountability processes, and competing to convert customers. The rules of copyright law also privilege access to certain works over others, encouraging AI creators to use easily available, legally low-risk sources of data for teaching AI, even when those data are demonstrably biased. Second, it examines how a different part of copyright law—the fair use doctrine—has traditionally been used to address similar concerns in other technological fields, and asks whether it is equally capable of addressing them in the field of AI bias. The Article ultimately concludes that it is, in large part because the normative values embedded within traditional fair use ultimately align with the goals of mitigating AI bias and, quite literally, creating fairer AI systems.
Amanda Levendowski, How Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem, 93 Wash L. Rev. 579 (2018), Link
Margoni, Thomas; with Martin KretschmerThe Text and Data Mining Exception in the Proposal for a Directive on Copyright in the Digital Single Market: Why it is Not What EU Copyright Law Needs2018The Proposal for a Directive on Copyright in the Digital Single Market (the Proposal) contains a number of provisions intended to modernise EU copyright law and to make it “fit for the digital age”. Some of these provisions have been object of a lively scholarly debate in the light of their controversial nature (the proposed adjustment of intermediary liability for copyright purposes contained in Art. 13, see here at p. 7) or because they propose to introduce a new right within the already variegate EU neighbouring right landscape (i.e. the protection for press publishers contained in Art. 11). Far less attention has attracted the provision contained in Art. 3 of the Proposal dedicated to “Text and data mining”... The goal of Art. 3 is to introduce a mandatory exception in EU copyright law which will exempt acts of reproduction made by research organisations in order to carry out text and data mining for the purposes of scientific research. In this blog, Thomas Margoni and Martin Kretschmer discuss Art. 3 and explain why its formulation – although underpinned by the right innovation policy goal – is wrong.Thomas Margoni & Martin Kretschmer, The Text and Data Mining Exception in the Proposal for a Directive on Copyright in the Digital Single Market: Why it is Not What EU Copyright Law Needs, CREATe (Apr. 25, 2018), Link
Margoni, ThomasArtificial Intelligence, Machine Learning and EU Copyright Law: Who Owns AI?2018Within the broad field of Artificial Intelligence (AI), Machine Learning (ML) looks at improving the performances of computers in executing tasks for which they were not specifically pre-programmed. Applied to the field of Natural Language Processing (NLP), ML helps computers to autonomously learn tasks such as the recognition, understanding and generation of natural language (i.e. the lan- guage spoken by humans). In other words, ML applied to NLP refers to the ability of humans to interact with computers in the same way in which humans interact among themselves. On the part of the computers this implies being able to understand human language, to understand its meaning, and to interact with it through the generation of new languageThomas Margoni. Artificial Intelligence, Machine Learning and EU Copyright Law: Who Owns AI? CREATe Working Paper 2018/12. Link
Montagnani, Maria Lillà with Giorgio Aime‘Il text and data mining e il diritto d’autore’2018ITALIAN: il lavoro analizza la congruenza della proposta di eccezione per text and data mining con la creazione di un'economia basata sui dati.
ENGLISH: the paper analyzes the congruence of the text and data mining exception proposal with the creation of a data-driven economy.
Maria Lillà Montagnani and Giorgio Aime, ‘Il text and data mining e il diritto d’autore’, Annali Italiani di Diritto d’Autore, Vol. 26 (2018)Link
Yu, PeterFair Use and Its Global Paradigm Evolution2018Legal paradigms shift in response to political, economic, social, cultural and technological conditions. While these paradigms have moved from developed to developing countries, they rarely move in the opposite direction. Nevertheless, some transplants from developed countries do involve legal paradigms that align well with the needs, interests, conditions and priorities of developing countries. A case in point is the transplant of the fair use model in U.S. copyright law, which has attracted considerable debate, research and policy attention in the past few decades.
Because legal literature has thus far under-analyzed the transplant of the U.S. fair use model, this article focuses its analysis on fair use transplants. It begins by reviewing the literature concerning paradigm shift, in particular Thomas Kuhn’s seminal work. The article then documents a growing trend toward the worldwide adoption of the U.S. fair use model and a countertrend toward the retention of the status quo. The juxtaposition of these two trends explain why jurisdictions that set out to transplant U.S.-style fair use ended up adopting a hybrid model.
The second half of this article interrogates the different primary causes behind such a paradigm evolution. While many possible factors exist within and outside the legal system, the discussion focuses on those relating to intellectual property law, international and comparative law, and the legislative process. The article concludes with recommendations concerning future efforts to broaden copyright limitations and exceptions in the United States and across the world. Specifically, it outlines six courses of action that seek to improve these reform efforts. It further identifies three modalities of evolution that can help tailor the transplanted fair use paradigm to local needs, interests, conditions and priorities.
Yu, Peter K., Fair Use and Its Global Paradigm Evolution, U. Ill. L. Rev., (2018), 111-169Link
Elkin-Koren, NivaFair Use By Design2017Copyright law seeks to promote the creation of works for the benefit of the public. Fair use doctrine was intended to serve as a check on copyright, to ensure that the law achieves its objectives. Fair use thus limits the rights of authors, and legitimizes unlicensed use of copyrighted works, whenever a rigid application of rights would prevent socially beneficial uses and possibly conflict with the public interest. Nowadays, the vast majority of copyrighted materials are distributed digitally, and copyright is enforced algorithmically. Fair use, as a legal defense against infringement allegations might be largely irrelevant where algorithmic enforcement keeps disputes out of court. In this paper I argue that algorithmic adjudication requires a new approach to fair use. In a nutshell, I argue that in order to serve its role in this era, fair use must be embedded in the system design. Fair use by design seeks to apply artificial intelligence and machine learning, to identify potential non-infringing uses. Algorithms could learn to identify patterns of fair use instances by studying previously decided fair use cases. The system will flag instances where fair use is highly probable, and may transfer inconclusive cases to human review, or subsequently to courts. Machine learning capabilities will ensure that the system incorporates new fair use rulings. Fair use by design is not simply a technical patch. Aside from the technological challenges, it also involves legal innovation. It requires a new framework for addressing algorithmic governance which involves redefining the role of courts and the nature of judicial oversight. The paper demonstrates these new legal challenges and discuss their implications for fair use doctrine. Part I begins by briefly describing the rise of algorithmic enforcement in copyright, where access to copyrighted materials, and copyright disputes, are increasingly governed by algorithms. Part II explains why this may put fair use and freedom of speech in danger. In Part III, I introduce the notion of fair use by design and explain how this approach can help address some of these threats. Finally, Part IV outlines some of the legal challenges that I anticipate in making this transition to algorithmic adjudication.
Elkin-Koren, Niva, Fair Use by Design (2017). 64 UCLA Law Review 22 (2017),Link
Hilty, RetoPosition Statement of the Max Planck Institute for Innovation and Competition on the Proposed Modernisation of European Copyright Rules Part B Exceptions and Limitations2017In Article 3 of the “Proposal for a Directive on copyright in the Digital Single Market COM(2016) 593 final” the European Commission suggests an exception for text and data mining (TDM). While, in principle, a clear legal framework for TDM is to be welcomed, the proposed provisions are to be criticized regarding their scope and the applied regulatory method. This Position Statement develops an alternative proposal: Since TDM is to be seen as a normal use of works and other protected subject-matter, a field exemption is suggested allowing everyone to carry out TDM related to lawfully accessible works or other subject-matter. This includes the permission to extract contents of databases and to make reproductions for the sole purpose of TDM. Moreover, research organizations also need to carry out TDM regarding content to which they do not have lawful access. The proposal includes a specific provision obliging rightholders who market works or other subject-matter primarily for research purposes to provide datasets suitable for TDM only, for which they may request a reasonable payment.Reto Hilty, Position Statement of the Max Planck Institute for Innovation and Competition on the Proposed Modernisation of European Copyright Rules Part B Exceptions and Limitations (Art. 3 – Text and Data Mining), (Max Planck Institute for Innovation & Competition Research Paper No. 17-02, 2017),Link
Hugenholtz, BerntFlexible Copyright. Can EU Author’s Right Accommodate Fair Use?2017Almost everyone agrees that modern copyright law needs to be flexible in order to accommodate rapid technological change and evolving media uses. In the United States, fair use is the flexible instrument of choice. Author’s rights systems in Europe are generally deemed to be less flexible and less tolerant to open-ended limitations and exceptions. But are they really? This chapter makes the case that (1) author’s rights systems can be made as flexible as copyright systems, and (2) that the existing EU legal framework does not preclude the development of flexible norms at the national level.P. Bernt Hugenholtz, Flexible Copyright. Can EU Author’s Right Accommodate Fair Use?, in Copyright Law in an Age of Limitations and Exceptions 275–291 (Ruth Okediji ed., 2017). Link
Quintais, JoãoRethinking Normal Exploitation: Enabling Online Limitations in EU Copyright Law2017The adoption of limitations to copyright is regulated at international and EU level by the three-step test.The major obstacle to new limitations for online use is a strict interpretation of the test, namely its second step, according to which a limitation shall not conflict with the normal exploitation of works.This article examines the test with a focus on the second step and its application to the digital and cross- border environment. It argues for a flexible and policy-oriented reading of the concept of normal exploitation. Following this approach could enable the introduction of new online limitations in EU law. In particular, within the context of current EU copyright reform, a flexible interpretation could support the introduction of a mandatory and unwaivable limitation for user-generated content.Joao Pedro Quintais, Rethinking Normal Exploitation: Enabling Online Limitations in EU Copyright Law, 6 AMI 197, 197–205 (2017), Link
Samuelson, PamelaJustifications for Copyright Limitations and Exceptions2017Modern copyright laws grant authors a broad set of rights to control exploitations of their works. Typically tempering the reach of these broad rights are a series of limitations and exceptions (L&Es) adopted by legislatures or developed by courts through common law adjudication. L&Es uniformly result in free uses of protected works under U.S. copyright law, although in other countries, some L&Es may be subject to equitable remuneration obligations. L&E provisions in national copyright laws often seem a hodgepodge of special purpose provisions whose policy justifications are sometimes difficult to discern.
The essay traces the historical development of L&Es in U.S. copyright law. For the first hundred years of the nation’s existence, there were no L&Es in its copyright law, in part because rights were fewer in number and narrower in scope than they became over time. In the late 19th and early 20th centuries, courts invented the exhaustion of rights and fair use doctrines as limits on copyright’s scope. The exhaustion doctrine was first codified in the Copyright Act of 1909 and fair use in the Copyright Act of 1976 (“1976 Act”), although these doctrines have continued to evolve in the nearly four decades after their enactment. Less visible, although quite important to those whom they affect, are dozens of other L&Es codified in the 1976 Act.
The essay then considers nine justifications for the existence of these L&Es. One set promotes ongoing authorship. A second recognizes both authorial and broader public interests in dissemination of news, freedom of expression, and access to information. A third protects privacy, personal autonomy, and ownership interests of consumers. A fourth aims to fulfill certain cultural and social policy goals. A fifth enables public institutions, such as courts and legislatures, to function more effectively. A sixth fosters competition and ongoing innovation. A seventh exempts incidental uses lacking in economic significance. An eighth addresses market failure problems. A ninth encompasses L&Es adopted for politically expedient reasons.
It also discusses a tenth type of L&E, those designed to enable copyright law to be flexible and adaptable over time. The fair use doctrine accomplishes this goal in the U.S., although there are other ways to build flexibility into copyright laws. Especially in an era of rapid social, economic, and technological change, flexible exceptions such as fair use have some advantages over specific L&Es.
The essay concludes that the optimal policy for L&Es may well be to have specific exceptions for categories of justified uses that are relatively stable over time and for which predictability is more important than flexibility and to have an open-ended exception such as fair use to allow the law to adapt to new uses not contemplated by the legislature.
Pamela Samuelson, Justifications for Copyright Limitations and Exceptions, in CopyrightLaw in an Age of Limitations and Exceptions 12–59 (Ruth Okediji ed., 2017). Link
Shultz, JeffHow Much Data Is Created on the Internet Each Day?,201790% of the data on the internet has been created since 2016, according to an IBM Marketing Cloud study. People, businesses, and devices have all become data factories that are pumping out incredible amounts of information to the web each day. This post has been tracking the growth of data created on the internet for several years, and have updated the information for 2019 to show you how much data that is being created on the internet – every day!Jeff Schultz, How Much Data Is Created on the Internet Each Day?, DZone, Aug. 06, 2019,Link
Sobel, BenjaminArtificial Intelligence's Fair Use Crisis2017As automation supplants more forms of labor, creative expression still seems like a distinctly human enterprise. This may someday change: by ingesting works of authorship as “training data,” computer programs can teach themselves to write natural prose, compose music, and generate movies. Machine learning is an artificial intelligence (“AI”) technology with immense potential and a commensurate appetite for copyrighted works. In the United States, the copyright law mechanism most likely to facilitate machine learning’s uses of protected data is the fair use doctrine. However, current fair use doctrine threatens either to derail the progress of machine learning or to disenfranchise the human creators whose work makes it possible.
This Article addresses the problem in three Parts: using popular machine learning datasets and research as case studies, Part I describes how programs “learn” from corpora of copyrighted works and catalogs the legal risks of this practice. It concludes that fair use may not protect expressive machine learning applications, including the burgeoning field of natural language generation. Part II explains that applying today’s fair use doctrine to expressive machine learning will yield one of two undesirable outcomes: if U.S. courts reject the fair use defense for machine learning, valuable innovation may move to another jurisdiction or halt entirely; alternatively, if courts find the technology to be fair use, sophisticated software may divert rightful earnings from the authors of input data. This dilemma shows that fair use may no longer serve its historical purpose. Traditionally, fair use is understood to benefit the public by fostering expressive activity. Today, the doctrine increasingly serves the economic interests of powerful firms at the expense of disempowered individual rights holders. Finally, in Part III, this Article contemplates changes in doctrine and policy that could address these problems. It concludes that the United States’ interest in avoiding both prongs of AI’s fair use dilemma offers a novel justification for redistributive measures that could promote social equity alongside technological progress.
Benjamin Sobel, Artificial Intelligence's Fair Use Crisis, 41 Colum. J.L. & Arts 45 (2017), Link
Yu, PeterCustomizing Fair Use Transplants2017In the past decade, policymakers and commentators across the world have called for the introduction of copyright reform based on the fair use model in the United States. Thus far, Israel, Liberia, Malaysia, the Philippines, Singapore, South Korea, Sri Lanka and Taiwan have adopted the fair use regime or its close variants. Other jurisdictions such as Australia, Hong Kong and Ireland have also advanced proposals to facilitate such adoption.
Written for a special issue on "Intellectual Property Law in the New Technological Age: Rising to the Challenge of Change?", this article examines the increasing efforts to transplant fair use into the copyright system based on the U.S. model. It begins by briefly recapturing the strengths and weaknesses of legal transplants. The article then scrutinizes the ongoing effort to transplant fair use from the United States. Specifically, it identifies eight modalities of transplantation, drawing on experiences in China, Australia, Hong Kong, Ireland, Israel, Liberia, Malaysia, the Philippines, Singapore, South Korea, Sri Lanka and Taiwan. This article concludes with five lessons that can be drawn from studying the ongoing transplant efforts.
Yu, Peter K., Customizing Fair Use Transplants (October 13, 2017) Laws, Vol. 7, Issue 1, Article 9, 2018, Texas A&M University School of Law Legal Studies Research Paper No. 17-78Link
Butler, BrandonSome Conversation Starters Concerning the Problem of Online-Only Music for Libraries2016In this white paper, I discuss some of the copyright-related problems libraries face as they attempt to collect digital music, as well as possible solutions. The paper was prepared as part of an IMLS-funded project led by John Vallier and Judy Tsou at the University of Washington. Their final report for the grant is: Tsou, Judy and John Vallier. "Ether Today, Gone Tomorrow: 21st Century Sound Recording Collection in Crisis." Notes, vol. 72 no. 3, 2016, p. 461-483. Project MUSE, doi:10.1353/not.2016.0041.Brandon Butler, Some Conversation Starters Concerning the Problem of Online-Only Music for Libraries, IMLS-funded White Paper (2016)Link
Butler, BrandonCanaries in the Text Mine: Fair Use Rights and Text+Data Mining with Licensed Content2016Slides from a webcast on the scope and availability of fair use for text and data mining on licensed content.Brandon Butler, Canaries in the Text Mine: Fair Use Rights and Text+Data Mining with Licensed Content (slides), Presented to SPARC for Fair Use Week (2016), Link
Caspers, Marco; with Lucie Guibault, Kiera McNeice, Stelios Piperidis, Kanella Pouli, Maria Eskevich, and Maria GavriilidouDeliverable D3.3+ Baseline Report of Policies and Barriers of TDM in Europe2016AIM - The overall aim of the FutureTDM project is to improve the uptake of text and data mining (TDM) in the European Union. It is essential to map the barriers that limit the uptake of TDM, in order to determine what actions stakeholders in Europe need to undertake to overcome these barriers and contribute to an environment that promotes TDM. Therefore, this deliverable identifies the barriers to TDM in Europe, covering many aspects and dimensions in which TDM is hindered. Two basic categories of barriers are distinguished in this regard: Legal and policy barriers- Barriers related to (legal) regulation and stakeholder policies; Practical barriers- Barriers relating to (lack of) skills, education, technical issues, funding or business environment.

The findings regarding the first category are the result of an extensive research into legal regulation, and in particular intellectual property regimes and data protection law, as well as stakeholder policies dealing with legal rights and obligations ensuing from these legal areas. The results mainly flow from the research tasks of WP3. In addition, the findings from the practical barriers reflect the activities across all work packages, consisting of stakeholder engagement through workshops and interviews, other informal meetings with stakeholders, research into the technical and application landscape of TDM, and economic research.

This deliverable is an extended version of Deliverable D3.3, which only covered legal and policy barriers. Now that we have been able to identify all sorts of barriers throughout all tasks within the FutureTDM project, we have updated it to provide a full overview of barriers that hinder the uptake of TDM in the EU.
Marco Caspers, Lucie Guibault, Kiera McNeice, Stelios Piperidis, Kanella Pouli, Maria Eskevich, and Maria Gavriilidou. Future TDM, Deliverable D3.3+ Baseline Report of Policies and Barriers of TDM in Europe 75–76, (2016)Link
Elkin-Koren, NivaThe New Frontiers of User Rights2016An extensive study of enforcement practices pertaining to online copyright infringements in Israel offers empirical evidence of the impact of fair use in the digital era. Israel introduced fair use about a decade ago in the 2007 Copyright Act. The study compared two major enforcement strategies following the enactment of the law: traditional court proceedings and Notice and Takedown procedures implemented by online intermediaries. The findings suggest that introducing a fair use provision in the statute might be an important step, yet, this alone cannot safeguard access to knowledge.
Based on these findings, this Article argues that in order to secure a sufficient level of free and unlicensed access to knowledge, it is necessary to develop a more comprehensive approach to permissible uses that would incorporate fair use in new and innovative ways. The approach proposed in this Article is twofold: First, at the conceptual level, fair use should be interpreted as a user’s right and not merely an affirmative defense. Second, fair use should be incorporated into online enforcement systems, by embedding fair use considerations in the design.
Niva Elkin-Koren, The New Frontiers of User Rights, 32 Am. U. Int'l L. Rev. 1 (2016)Link
Geiger, Christophe; with Giancarlo Frosio and Oleksandr Bulayenko Opinion of the CEIPI on the European Commission’s Copyright Reform Proposal, with a Focus on the Introduction of Neighbouring Rights for Press Publishers in EU Law2016The Centre for International Intellectual Property Studies (CEIPI) is an institute devoted to education and research in intellectual property and is a constituent part of the University of Strasbourg. CEIPI analyses and comments the main developments in the area of intellectual property at national, European and international levels. From this perspective, the European Commission’s Proposal for a Directive of the European Parliament and of the Council on copyright in the Digital Single Market of 14 September 2016 — and more generally any step towards copyright reform in the European Union — is of particular interest to CEIPI, which hereby intends to react on the proposal to introduce in EU copyright law neighbouring rights for press publishers for the digital uses of their publications.Christophe Geiger et al., Opinion of the CEIPI on the European Commission’s Copyright Reform Proposal, with a Focus on the Introduction of Neighbouring Rights for Press Publishers in EU Law, (CEIPI Research Paper No. 2016-01, 2016)Link
Katz, ArielCopyright, Exhaustion, and the Role of Libraries in the Ecosystem of Knowledge2016In this Article, written for a symposium on the future of libraries in the digital age, I present and challenge two common views about the scope of the first-sale doctrine, or exhaustion: namely, that the doctrine applies only to the transfer of tangible copies of works but not to the transfer of digital files, and that copyright owners can circumvent exhaustion by characterizing transactions as “licenses” rather than “sales”, or by contracting out of it. The law on digital exhaustion is anything but settled. As codified, the “first sale” doctrine it may limit only the distribution right, but its statutory presence might merely affirm a broader principle of exhaustion—one of the several principles in copyright law that limit the copyright owner’s powers. The principle of exhaustion can apply, and at times has been applied, beyond the distribution right. Likewise, the notion that copyright owners can circumvention exhaustion by characterizing transactions as “licenses” rather than “sales”, or by using contracts to exercise downstream control is hardly a foregone conclusion. Established precedent and sound legal principle indicate that while the law recognizes some scope for contracting around exhaustion, courts will not necessarily uphold any private reordering of the respective legal entitlements of copyright owners and users. While these observations and conclusions apply to exhaustion generally, they apply most demonstrably in the case of libraries. Libraries occupy a privileged space in the copyright system. Historically, libraries predate copyright, and the institutional role of libraries and institutions of higher learning in the “promotion of science” and the “encouragement of learning” was acknowledged before legislators decided to grant authors exclusive rights in their writings. The historical precedence of libraries and the legal recognition of their public function cannot determine every contemporary copyright question, but this historical fact is not devoid of legal consequence. History is part of the legislative history of statutes, and it constitutes part of the context that informs the interpretation of current statutes. Therefore, if not false, then the view that the current legislation does not allow digital exhaustion is at least questionable.Ariel Katz, Copyright, Exhaustion, and the Role of Libraries in the Ecosystem of Knowledge, 13(1) I/S: A Journal of Law and Policy for the Information Society, 81 (2016)Link
Margoni, Thomas; with Giulia DoreWhy We Need a Text and Data Mining Exception (But it is Not Enough)2016Text and Data Mining (TDM) has become a key instrument in the development of scientific research. Its ability to derive new informational value from existing text and data makes this analytical tool a necessary element in the current scientific environment. TDM crucial importance is particularly evident in a historical moment when the extremely high amounts of information produced (scholarly publications, databases and datasets, social networks, etc.), make it unlikely, if not impossible, for humans to read them all. Nevertheless, TDM, at least in the EU, is often a copyright infringement. This situation illustrates how certain legal provisions stifle scientific development, instead of fostering it, with significant damage for EU based researchers and research institutions and for the European socio-economic competitiveness more in general. Other countries leading the scientific and technological development have already implemented legislative or judicial solution permitting TDM, also for commercial purposes.
This extended abstract suggests, as it has been already advocated in literature and in policy documents, that a mandatory TDM exception, not limited to non-commercial research, is needed to bring the EU on the same level playing field as other jurisdictions, such as the US and Japan.
However, this extended abstract further argues that, while in the short-term a TDM mandatory exception can and should be implemented by the EU legislator, by way of a harmonising Directive(s), for the long-term sustainability of the EU copyright framework, a broader, general and technology-neutral exception should instead be considered. The latter should take the form of a fair use like standard and indeed be part of a more structured intervention in the field of copyright, by means of a Regulation that would provide uniformity to the whole EU copyright framework.
Thomas Margoni & Giulia Dore, Why We Need a Text and Data Mining Exception (But it is Not Enough), (2016)Link
Margoni, Thomas; with Roberto Caso, Rossana Ducato, Paolo Guarda, and Valentina Moscon Open Access, Open Science, Open Society2016Open Access’ main goal is not the subversion of publishers’ role as driving actors in an oligopolistic market characterised by reduced competition and higher prices. OA’s main function is to be found somewhere else, namely in the ability to subvert the power to control science’s governance and its future directions (Open Science), a power that is more often found within the academic institutions rather than outside. By decentralising and opening-up not just the way in which scholarship is published but also the way in which it is assessed, OA removes the barriers that helped turn science into an intellectual oligopoly even before an economic one. The goal of this paper is to demonstrate that Open Access is a key enabler of Open Science, which in turn will lead to a more Open Society. Furthermore, the paper argues that while legislative interventions play an important role in the top-down regulation of Open Access, legislators currently lack an informed and systematic vision on the role of Open Access in science and society. In this historical phase, other complementary forms of intervention (bottom-up) appear much more “informed” and effective. This paper, which intends to set the stage for future research, identifies a few pieces of the puzzle: the relationship between formal and informal norms in the field of Open Science and how these impact on intellectual property rights, the protection of personal data, the assessment of science and the technology employed for the communication of science.Margoni, Thomas and Caso, Roberto and Ducato, Rossana and Guarda, Paolo and Moscon, Valentina, Open Access, Open Science, Open Society (March 20, 2016). Trento Law and Technology Research Group, Research Paper No. 27Link
Scaria, Arul George; Rishika Rangarajan'Fine-tuning the Intellectual Property Approaches to Fostering Open Science: Some Insights from India'2016Open science is a global movement attempting to reclaim certain core values of science. One such value is openness. Given the important role of science in political, economic, social and technological development, it is important to identify the legal and policy reforms required to promote open science. Besides analysing the benefits of open science, it is also important to analyse the challenges in practising open science, particularly in the global south. In the context of one the countries in the global south, i.e. India, this paper analyses how approaches towards intellectual property rights (IPRs) can be fine-tuned for fostering open science. This paper begins with an introduction that contextualises the discussion. Section II of this paper examines in detail the current crisis in science. Section III introduces how open science emerged as a movement to counter this crisis. It also discusses the diverse benefits and challenges of practising open science. Section IV analyses the implications of open science for the global south. In this section, we also map the evolution of the open movements in India. In Section V, we discuss how the approaches towards IPRs could be modified to foster the open science movement in India. The article concludes by highlighting some areas for future research.Arul George Scaria and Rishika Rangarajan, 'Fine-tuning the Intellectual Property Approaches to Fostering Open Science: Some Insights from India', (2016) 8 WIPO Journal 109.Link
Stamatoudi, IriniText and Data Mining2016Irini Stamatoudi, Text and Data Mining, in New Developments in EU and International Copyright Law 264–265 (Irini Stamatoudi ed., 2016), Link
Handke, ChristianIs Europe Falling Behind in Data Mining? Copyright's Impact on Data Mining in Academic Research2015This empirical paper discusses how copyright affects data mining (DM) by academic researchers. Based on bibliometric data, we show that where DM for academic research requires the express consent of rights holders: (1) DM makes up a significantly lower share of total research output; and (2) stronger rule-of-law is associated with less DM research. To our knowledge, this is the first time that an empirical study bears out a significant negative association between copyright protection and innovation.Handke, Christian and Guibault, L. and Vallbé, Joan-Josep, Is Europe Falling Behind in Data Mining? Copyright's Impact on Data Mining in Academic Research (June 7, 2015).Link
U.S. Copyright OfficeOrphan Works and Mass Digitization: A Report of the Register of Copyrights2015As the Supreme Court reaffirmed in 2012, facilitating the dissemination of creative expression is an important means of fulfilling the constitutional mandate to “promote the Progress of Science” through the copyright system. This Report addresses two circumstances in which the accomplishment of that goal may be hindered under the current law due to practical obstacles preventing good faith actors from securing permission to make productive uses of copyrighted works. First, with respect to orphan works, referred to as “perhaps the single greatest impediment to creating new works,” a user’s ability to seek permission or to negotiate licensing terms is compromised by the fact that, despite his or her diligent efforts, the user cannot identify or locate the copyright owner. Second, in the case of mass digitization – which involves making reproductions of many works, as well as possible efforts to make the works publicly accessible – obtaining permission is essentially impossible, not necessarily because of a lack of identifying information or the inability to contact the copyright owner, but because of the sheer number of individual permissions required.U.S. Copyright Office, Orphan Works and Mass Digitization: A Report of the Register of Copyrights (2015)Link
Brooke, Michelle; with Peter Murray-Rust and Charles OppenheimThe Social, Political and Legal Aspects of Text and Data Mining2014The ideas of textual or data mining (TDM) and subsequent analysis go back hundreds if not thousands of years. Originally carried out manually, textual and data analysis has long been a tool which has enabled new insights to be drawn from text corpora. However, for the potential benefits of TDM to be unlocked, a number of non-technological barriers need to be overcome. These include legal uncertainty resulting from complicated copyright, database rights and licensing, the fact that some publishers are not currently embracing the opportunities TDM offers the academic community, and a lack of awareness of TDM among many academics, alongside a skills gap.Michelle Brook, Peter Murray-Rust and Charles Oppenheim. The Social, Political and Legal Aspects of Text and Data Mining (TDM), 20 D-Lib Magazine 1–8 (2014),Link
Carre, Stéphanie; with Christophe Geiger, Jean Lapousterle, Franck Macrez, Adrien Bouvel, Théo Hassler, Xavier Seuba, Oleksandr Bulayenko, Franciska Schönherr, and Marie Hemmerle-ZempResponse of the CEIPI to the Public Consultation of the European Commission on the Review of the European Union Copyright Rules2014The Centre for International Intellectual Property Studies (CEIPI) is an institute devoted to education and research in intellectual property and is a constituent part of the University of Strasbourg. CEIPI has a research team that studies and analyses the main developments in the area of intellectual property at national, European, European Union and international levels. From this perspective, the public consultation on the review of the European Union copyright rules is of particular interest to CEIPI which hereby submits its opinion to the Commission. Indeed, while determining the future contours of copyright in the European Union, the Commission should consider carefully academic studies and analysis. In this regard, one of the objectives of this response is to emphasise that substantial academic work is needed to determine the possibility and implementation of a unitary copyright title in the European Union. Stéphanie Carre, Christophe Geiger, Jean Lapousterle, Franck Macrez, Adrien Bouvel, Théo Hassler, Xavier Seuba, Oleksandr Bulayenko, Franciska Schönherr, and Marie Hemmerle-Zemp. Response of the CEIPI to the Public Consultation of the European Commission on the Review of the European Union Copyright Rules (CEIPI Research Paper No. 2014-01, 2014),Link
European CommissionStandardisation in the Area of Innovation and Technological Development, Notably in the Field of Text and Data Mining2014Text and data mining (TDM) is an important technique for analysing and extracting new insights and knowledge from the exponentially increasing store of digital data (‘Big Data’). It is important to understand the extent to which the EU’s current legal framework encourages or obstructs this new form of research and to assess the scale of the economic issues at stake.
TDM represents a significant economic opportunity for Europe. At present, the use of TDM tools by researchers in Europe appears to be lower than in its main competitors. In the legal issues section a description is offered of the application of different intellectual property laws and the extent to which TDM in Europe is facilitated by any existing exceptions to either EU copyright or database law. The application of a copyright and database exception relating to teaching or scientific research is optional and has not been implemented at all in some Member States. This has contributed to uncertainty in the European scientific research community.
There is a serious risk that Europe’s relative competitive position as a research location for the exploitation of digital data will deteriorate further, if steps are not taken to address the issues discussed in this report prepared for the EC Directorate-General for Research and Innovation by a Group of Experts.
European Commission, Standardisation in the Area of Innovation and Technological Development, Notably in the Field of Text and Data Mining: Report from the Expert Group (2014)Link
Geiger, Christophe; with Jonathan Griffiths, Martin Senftleben, Lionel A. F. Bently, and Raquel XalabarderLimitations and Exceptions as Key Elements of the Legal Framework for Copyright in the European Union2014In this opinion, the European Copyright Society (ECS) puts on record its views on the issues raised by the Judgment of the Court of Justice of the European Union (CJEU) in Case C-201/13, Deckmyn, which departs from the doctrine of strict interpretation of exceptions and limitations in cases in which fundamental rights such as freedom of expression are involved. The opinion welcomes this development for the following reasons: firstly, due to the importance of exceptions and limitations in facilitating creativity and securing a fair balance between the protection of and access to copyright works; secondly, because of the Court’s determination to secure a harmonized interpretation of the meaning of exceptions and limitations; thirdly, because of the Court’s adoption of an approach to the interpretation of exceptions and limitations which promotes their effectiveness and purpose; and, finally, due to the Court’s recognition of the role of fundamental rights in the copyright system: in particular, its recognition that the parodic use of works is justified by the right to freedom of expression. At the same time, the ECS recommends caution in constraining the scope of exceptions and limitations in a manner that may go beyond what might be considered necessary in a democratic society.Christophe Geiger, Jonathan Griffiths, Martin Senftleben, Lionel A. F. Bently, and Raquel Xalabarder, Limitations and Exceptions as Key Elements of the Legal Framework for Copyright in the European Union,  (European Copyright Society, 2014) Link
Murray-Rust, Peter; with Jennifer Molloy and Diance CabellOpen Content Mining2014Peter Murray-Rust, Jennifer Molloy and Diance Cabell. Open Content Mining, in Issues in Open Research Data, (Samuel A. Moore ed., 2014)Link
Rajaretnam, ThrillaData Mining and Data Matching: Regulatory and Ethical Considerations Relating to Privacy and Confidentiality in Medical Data, 92014The application of data mining techniques to health-related data is beneficial to medical research. However, the use of data mining or knowledge discovery in databases, and data matching and profiling techniques, raises ethical concerns relating to consent and undermines the confidentiality of medical data. Data mining and data matching requires active collaboration between the medical practitioner and the data miner. This article examines the ethical management of medical data including personal information and sensitive information in the healthcare sector. It offers some ethical and legal perspectives on privacy and the confidentiality of medical data. It examines the International landscape of health information privacy protection, relevant Australian legislation and recommendations to improve the ethical handling of medical data proposed by the Australian Law Reform Commission.Thilla Rajaretnam, Data Mining and Data Matching: Regulatory and Ethical Considerations Relating to Privacy and Confidentiality in Medical DataLink
Senftleben, MartinComparative Approaches to Fair Use: An Important Impulse for Reforms in EU Copyright Law2014Fair use provisions in the field of copyright limitations, such as the U.S. fair use doctrine, offer several starting points for a comparative analysis of laws. Fair use may be compared with fair dealing. With the evolution of fair use systems outside the U.S., fair use can also be compared across different countries. The analysis may also concern fair use concepts in different domains of intellectual property. Instead of making any of these direct comparisons, the present analysis deals with another aspect of comparative analyses: the study of foreign fair use provisions as a basis for the improvement of domestic legislation. More specifically, the analysis will show that important impulses for necessary reforms in the EU system of copyright exceptions can be derived from a comparison with the flexible approach taken in the U.S. For this purpose, the legal traditions underlying the legislation on copyright limitations in the EU (civil law) and the U.S. (common law) will be outlined (section 1) before explaining the need for reforms in the current EU system (section 2). On this basis, strategies for translating lessons to be learned from the U.S. fair use approach (section 3) into the EU system will be discussed. This translation is unlikely to fail because of an inability or reluctance of civil law judges to apply open-ended norms (section 4). Under existing EU norms, however, a degree of flexibility comparable to the flexibility offered in the U.S. cannot be achieved (section 5). To establish a sufficiently flexible system, EU legislation would have to be amended (section 6 and concluding section 7).Martin Senftleben, Comparative Approaches to Fair Use: An Important Impulse for Reforms in EU Copyright Law, in Methods and Perspectives in Intellectual Property (2014), Link
Triaille, Jean-Paul; with Jérôme de Meeûs d’Argenteuil, and Amélie de FrancquenStudy of the Legal Framework of Text and Data Mining (TDM)2014The background of this study is described as follows in its Terms of Reference (“ToRs”): “Data mining is currently subject to discussions in the UK in the context of a review of the copyright legislation in that Member State. Other Member States (e.g. Ireland) are also assessing the issue. The matter is nevertheless rather new and there is not a “coined definition” of what data mining activities are. Text and data mining has, in the impact assessment performed by the UK IPO, been defined in the following way: “Text and data and data analytics methods extract data from existing electronic information, to establish new facts and relationships, building new scientific findings from prior research. These new methods involve copying of prior works as part of the process to extract data”."Jean-Paul Triaille, Jérôme de Meeûs d’Argenteuil, and Amélie de Francquen. Study of the Legal Framework of Text and Data Mining (TDM), 28 (2014)Link
Wu, Xindong; with Xingquan Zhu, Gong-Qing Wu, and Wei DingData Mining With Big Data2014Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data is now rapidly expanding in all science and engineering domains, including physical, biological and bio- medical sciences. This article presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.Xindong Wu, Xingquan Zhu, Gong-Qing Wu, and Wei Ding. Data Mining With Big Data, 26  IEEE Transactions on Knowledge and Data Engineering 97 (2014)Link
Borghi, Mauizio; with Stavroula KarapapaCopyright and Mass Digitization: A Cross-Jurisdictional Perspective2013In an age where works are increasingly being used, not only as works in the traditional sense, but also as carriers of data from which information may be automatically extracted for various purposes, Borghi and Karapapa consider whether mass digitisation is consistent with existing copyright principles, and ultimately whether copyright protection needs to be redefined, and if so how?
The work considers the activities involved in the process of mass digitization identifying impediments to the increasing number of such projects such as the inapplicability of copyright exceptions, difficulties in rights clearance, and the issue of 'orphan' and out-of-print works.
It goes on to examine the concept of 'use' of works in light of mass digital technologies and how it impinges on copyright law and principles; for example considering whether scanning and using optical character recognition in mass digital projects qualify as transformative use, or whether text mining on digitial repositories should be a permitted activity. These issues are considered in the context of both European and US law. Consideration is also given to mass digitization in the wider context of 'law and technology', comparing mass digitization issues with those of genetic databases, online privacy and data protection.
Illustrating how mass digitization unveils a number of unsettled theoretical issues within copyright, the book proposes a new regulatory framework for the use of works in the context of emerging technologies, providing a new rights-based approach to dealing with copyright.
Maurizio Borghi & Stavroula Karapapa, Copyright and Mass Digitization: A Cross-Jurisdictional Perspective (Stavroula Karapapa ed., 2013)Link
Contreras, JorgeConfronting the Crisis in Scientific Publishing: Latency, Licensing and Access2013The serials crisis in scientific publishing can be traced to the long duration of copyright protection and the assignment of copyright by researchers to publishers. Over-protection of scientific literature has enabled commercial publishers to increase subscription rates to a point at which access to scientific information has been curtailed with negative social welfare consequences. The so-called uniformity costs imposed by such over-protection can be addressed by tailoring intellectual property rights, either through legal change or private ordering.
Current open access channels of distribution offer alternative approaches to scientific publishing, but neither the Green OA self-archiving nor the Gold OA author-pays models has yet achieved widespread acceptance. Moreover, recent proposals to abolish copyright protection for academic works, while theoretically attractive, may be difficult to implement in view of current legislative and judicial inclinations. Likewise, funder open access mandates such as the NIH OA Policy, which are already responsible for the public release of millions of scientific articles, suffer from various risks and political uncertainty.
In this paper, I propose an alternative private ordering solution based on latency equilibrium values observed in open access stakeholder negotiation settings. Under this proposal, research institutions would collectively develop and adopt publication agreements that do not transfer copyright ownership to publishers, but instead grant publishers a one-year exclusive period in which to publish a work. This limited period of exclusivity should enable the publisher to recoup its publishing costs and a reasonable profit through subscription revenues, while restoring control of the article copyright to the author at the end of the exclusivity period. This balanced approach address the needs of both publishers and the scientific community, and would, I believe, avoid many of the challenges faced by existing open access models.
Jorge L. Contreras, Confronting the Crisis in Scientific Publishing: Latency, Licensing and Access, 53 Santa Clara L. Rev. 491-575 (2013)Link
Guadamuz, Andres; with Diane CabellAnalysis of UK/EU Law on Data Mining in Higher Education Institutions2013Data or text mining (hereafter called “content mining”) is a process that uses software that looks for interesting or important patterns in data that might otherwise not be observed. An example might be combining a database of journal articles about ground water pollution with one of hospital admissions to detect a pollution-related pattern of disease breakout.
It is also a useful tool in commerce. A credit card company might detect a correlation between purchases of tickets from particular airline with purchases of certain types of automobiles and develop a marketing program uniting appropriate vendors. One McKinsey report states that the utilization of ‘big data’ in the sphere of public data alone could create €250 billion annual value to Europe’s economy.
Content mining is increasingly accomplished by machine. Databases, particularly those produced by scientific research, are far too large to be scanned by human eyeball. However, the right to mine data is not assured by the law in most jurisdictions and even where it is, the terms of access to the majority of research publication databases deny permission to do so. One recent study indicated that obtaining permission to mine the thousands of articles appearing on a single subject from the myriad of different publishers would require 62% of a researcher’s time. Many content owners, including research institutions, have yet to develop any policy on content mining.
This report will identify the main legal barriers to data mining and data reuse and make policy suggestions to guide governments, funding agencies, and research institutions. As the title suggests, the emphasis of the study is about legal issues that are specific to higher education institutions (HEIs).
The first challenge for this report is to attempt to delimit the subject matter, as various types of content that are subject to automated analysis. HEIs can hold and share content of various formats, here are just a few examples:
Text: published articles, book chapters, preparatory notes, working papers, reports, teaching materials, conference papers, presentations, theses.
Datasets: statistical data, geolocation data, survey results, maps, figures, time series, genetic information, health records, computer logs.
Multimedia: pictures, sound recordings, interviews, presentations, video.
Each of the above may have separate legal regimes applying to them. In the interest of convenience and simplicity, whenever the report talks about database contents, there will be no distinction as to whether we are dealing with text, data or multimedia, unless clearly specified in the text.
Andres Guadamuz & Diane Cabell, Analysis of UK/EU Law on Data Mining in Higher Education Institutions, 4 Queen Mary J. Intell. Prop. 3, 6 (2013)Link
Guibault, Lucie; with Andreas WiebeSafe to Be Open: Study on the Protection of Research Data and Recommendations for Access and Usage2013Openness has become a common concept in a growing number of scientific and academic fields. Expressions such as Open Access (OA) or Open Content (OC) are often employed for publications of papers and research results, or are contained as conditions in tenders issued by a number of funding agencies. More recently the concept of Open Data (OD) is of growing interest in some fields, particularly those that produce large amounts of data – which are not usually protected by standard legal tools such as copyright. However, a thorough understanding of the meaning of Openness – especially its legal implications – is usually lacking.
Open Access, Public Access, Open Content, Open Data, Public Domain. All these terms are often employed to indicate that a given paper, repository or database does not fall under the traditional “closed” scheme of default copyright rules. However, the differences between all these terms are often largely ignored or misrepresented, especially when the scientist in question is not familiar with the law generally and copyright in particular – a very common situation in all scientific fields.
On 17 July 2012 the European Commission published its Communication to the European Parliament and the Council entitled “Towards better access to scientific information: Boosting the benefits of public investments in research”. As the Commission observes, “discussions of the scientific dissemination system have traditionally focused on access to scientific publications – journals and monographs. However, it is becoming increasingly important to improve access to research data (experimental results, observations and computer-generated information), which forms the basis for the quantitative analysis underpinning many scientific publications”. The Commission believes that through more complete and wider access to scientific publications and data, the pace of innovation will accelerate and researchers will collaborate so that duplication of efforts will be avoided. Moreover, open research data will allow other researchers to build on previous research results, as it will allow involvement of citizens and society in the scientific process.
In the Communication the Commission makes explicit reference to open access models of publications and dissemination of research results, and the reference is not only to access and use but most significantly to reuse of publications as well as research data.
The Communication marks an official new step on the road to open access to publicly funded research results in science and the humanities in Europe. Scientific publications are no longer the only elements of its open access policy: research data upon which publications are based should now also be made available to the public.
As noble as the open access goal is, however, the expansion of the open access policy to publicly funded research data raises a number of legal and policy issues that are often distinct from those concerning the publication of scientific articles and monographs. Since open access to research data – rather than publications – is a relatively new policy objective, less attention has been paid to the specific features of research data. An analysis of the legal status of such data, and on how to make it available under the correct licence terms, is therefore the subject of the following sections.
Lucie Guibault and Andreas Wiebe (eds.) Safe to Be Open: Study on the Protection of Research Data and Recommendations for Access and Usage (2013),Link
Hansen, David; with Peter Jaszi, Pamela Samuelson, Jason Schultz, and Rebecca TushnetEducational Fair Use Brief in Support of Georgia State University on Behalf of Amici Curiae Academic Authors and Legal Scholars2013For centuries, scholars and educators have excerpted the works of their colleagues, transforming them from individual, static monographs into dynamic pedagogical and intellectual tools for classroom learning. Such transformations reside at the heart of fair use, a core copyright law doctrine established to protect socially beneficial uses of works that increase public access and promote the progress of human understanding.
In this case, Plaintiff Publishers accuse GSU and its faculty of violating their copyrights through this practice. But, as the district court correctly found, such uses are fair, especially because they primarily use factual information to promote the purposes of education and teaching, the amount taken was reasonable in light of its purpose, and because Plaintiffs’ evidence of a cognizable copyright market harm was speculative at best. However, the district court erred when it incorrectly concluded that these uses are not transformative. Using an unduly narrow definition of the concept, it failed to consider how educators repurpose scholarly works in productive ways that bring new meaning to and understanding of the works used.
As scholars and educators who produce and repurpose such works, amici urge this Court to affirm that these uses constitute a transformative use under the first fair use factor, and to reaffirm the findings under the other factors that these uses are fair. A finding of fair use in this case not only furthers the underlying goals of scholarship and education - access to knowledge - but also the very purposes of the Copyright Act itself.
Hansen, David R. and Jaszi, Peter A. and Samuelson, Pamela and Schultz, Jason and Tushnet, Rebecca, Educational Fair Use Brief in Support of Georgia State University on Behalf of Amici Curiae Academic Authors and Legal Scholars (April 25, 2013). UC Berkeley Public Law Research Paper No. 2259697, Georgetown Public Law Research Paper No. 13-034,Link
International Federation of Library AssociationsStatement on Text and Data Mining2013As the leading international professional association concerned with information and library services, IFLA represents associations and institutions worldwide that endeavour to provide equitable access to a diversity of information.
IFLA maintains that legal certainty for text and data mining (TDM) can only be achieved by (statutory) exceptions. As an organization committed to the principle of freedom of access to information, and the belief that information should be utilised without restriction in ways vital to the educational and cultural well-being of communities, IFLA believes TDM to be an essential tool to the advancement of learning, and new forms of creation.
Copyright and database laws can affect the ability of libraries to fulfil their mandates and deliver information services for the benefit of their patrons, and can impede the use of materials by library users in ways that would benefit communities – for scholarship, research, improvements in health and science, creativity and social inclusion.
Int’l Fed’n of Libr. Ass’ns, Statement on Text and Data Mining (2013)Link
Adler, Prudence; with Pat Aufderheide, Brandon Butler, and Peter JasziCode of Best Practices in Fair Use for Academic and Research Libraries2012This is a code of best practices in fair use devised specifically by and for the academic and research library community. It enhances the ability of librarians to rely on fair use by documenting the considered views of the library community about best practices in fair use, drawn from the actual practices and experience of the library community itself.
It identifies eight situations that represent the library community’s current consensus about acceptable practices for the fair use of copyrighted materials and describes a carefully derived consensus within the library community about how those rights should apply in certain recurrent situations. These are the issues around which a clear consensus emerged over more than a year of discussions. The groups also talked about other issues; on some, there seemed not to be a consensus, and group members found others to be less urgent. The community may wish to revisit this process in the future to deliberate on emerging and evolving issues and uses.
Prudence S. Adler, Pat Aufderheide, Brandon Butler, and Peter Jaszi. Code of Best Practices in Fair Use for Academic and Research Libraries. Coordinated by the Association of Research Libraries, The Program on information Justice and Intellectual Property, and The Center for Media & Social Impact. January 2012. Link
Contreras, JorgeOpen Access Scientific Publishing and the Developing World2012Responding to rapid and steep increases in the cost of scientific journals, a growing number of scholars and librarians have advocated “open access” (OA) to the scientific literature. OA publishing models are having a significant impact on the dissemination of scientific information. Despite the success of these initiatives, their impact on researchers in the developing world is uncertain. This article analyses major OA approaches adopted in the industrialized world (so-called Green OA, Gold OA, and OA mandates, as well as non-OA information philanthropy) as they relate to the consumption and production of research in the developing world. The article concludes that while the consumption of scientific literature by developing world researchers is likely to be significantly enhanced through such programs, promoting the production of research in the developing world requires additional measures. These could include the introduction of better South-focused journal indexing systems that identify high-quality journals published in the developing world, coupled with the adjustment of academic norms to reward publication in such journals. Financial models must also be developed to decrease the reliance by institutions in the developing world on information philanthropy and to level the playing field between OA journals in industrialized and developing countries.Jorge L. Contreras, Open Access Scientific Publishing and the Developing World, 8 St Antony’s Intl. Rev. 43 (2012)Link
Geiger, Christophe; with Franciska SchönherrDefining the Scope of Protection of Copyright in the EU: The Need to Reconsider the Acquis regarding Limitations and Exceptions2012Christophe Geiger & Franciska Schönherr, Defining the Scope of Protection of Copyright in the EU: The Need to Reconsider the Acquis regarding Limitations and Exceptions, in Codification of European Copyright Law, Challenges and Perspectives 133–167 (Tatiana-Eleni Synodinou ed., 2012)Link
Hugenholtz, Bernt; with Martin SenftlebenFair Use in Europe: in Search of Flexibilities2012There appear to be good reasons and ample opportunity to (re)introduce a measure of flexibility in the national copyright systems of Europe. The need for more openness in copyright law is almost self-evident in this information society of highly dynamic and unpredictable change. A historic perspective also suggests that copyright law, particularly in the civil law jurisdictions of Europe, has lost much of its flexibility in the course of the past century. By contrast, with the accelerating pace of technological change in the 21st Century, and in view of the complex process of law making in the EU, the need for flexible copyright norms both at the EU and the national level is now greater than ever. Against this background, the authors argue that the EU copyright acquis leaves considerably more room for flexibilities than its closed list of permitted limitations and exceptions suggests. In the first place, the enumerated provisions are in many cases categorically worded prototypes rather than precisely circumscribed exceptions, thus leaving the Member States broad margins of implementation. In the second place, the EU acquis leaves ample unregulated space with regard to the right of adaptation that has so far remained largely unharmonized. A Member State desiring to take full advantage of all policy space available under the Information Society Directive, might achieve this by literally transposing the Directive’s entire catalogue of exception prototypes into national law. In combination with the three-step test, this would effectively lead to a semi-open norm almost as flexible as the fair use rule of the United States. Less ambitious Member States seeking to enhance flexibility while keeping its existing structure of limitations and exceptions largely intact, can explore the policy space left by distinct exception prototypes. In addition, the unharmonized status of the adaptation right would leave Member States free to provide for limitations and exceptions permitting, for example, fair transformative uses in the context of producing and disseminating user-generated content.P. Bernt Hugenholtz & Martin Senftleben, Fair Use in Europe: in Search of Flexibilities, (Institute for Information Law Research Paper No. 2012-33, 2012), Link
Hugenholtz, Bernt; with Ruth OkedijiConceiving an International Instrument on Limitations and Exceptions to Copyright2012The task of developing a global approach to limitations and exceptions is one of the major challenges facing the international copyright system today. This paper examines policy options and modalities for framing an international instrument on limitations and exceptions to copyright within the treaty obligations of the current international copyright system. We consider this international copyright acquis as our general starting point, and evaluate options for the design of such an instrument, including questions of political sustainability and institutional home.Bernt Hugenholtz & Ruth Okediji, Conceiving an International Instrument on Limitations and Exceptions to Copyright 6 (Amsterdam Law School Legal Studies Research Paper No. 2012-43, 2012)Link
Jockers, Matthew; Matthew Sag and Jason SchultzMatthew Sag and Jason Schultz. Digital Archives: Don’t Let Copyright Block Data Mining2012Advances in computer technology combined with the availability of digital archives are allowing human- ities scholars to do what biologists, physicists and economists have been doing for decades — analyse massive amounts of data. A far richer understanding of literature promises to emerge. For instance, large-scale quanti- tative projects are forcing scholars to recon- sider how literary canons are formed and are showing the extent to which authors’ works are shaped by factors outside their own crea- tive control, such as the period in which they lived, their gender and their nationality.
Yet in the United States, legal action pur- sued by the Authors Guild, an advocacy group for writers, could bar scholars from studying as much as two-thirds of the literary record. A small group of humanities scholars (ourselves included) is fighting back.
Matthew Jockers, Matthew Sag and Jason Schultz. Digital Archives: Don’t Let Copyright Block Data Mining, 490 Nature 29–30 (2012)Link
Katz, ArielThe Orphans, the Market, and the Copyright Dogma: A Modest Solution for a Grand Problem2012This article proposes a modest common law solution to the orphan works problem: works that are still under copyright but whose owners cannot be easily located. Most discussions on the orphan works problem focus on the demand side: on users’ inability to locate owners. However, looking also at the supply side reveals that the problem of orphan works arises not only because users find it prohibitively costly to locate owners, but also because under a strict permission-first rule copyright owners, who do not internalize the full social cost of forgone uses, face suboptimal incentives to maintain themselves locatable. However, in many cases copyright owners are usually the least-cost avoiders of the orphan works problem, and like in many other areas of law, should be encouraged to take steps to reduce the extent of the problem. Building on this insight, the article shows how considering the locatability of the owner of an infringed work at the remedy stage and tweaking the appropriate remedy will encourage owners to remain locatable, and why this solution is preferable to other proposed solutions. The article also discusses the tendency to treat the requirement to seek permission before using as a dogma, and why this dogmatic view of copyright impedes simple and efficient solutions and leads to adoption of grand solutions that are ineffective at best and harmful at worst.Ariel Katz, The Orphans, the Market, and the Copyright Dogma: A Modest Solution for a Grand Problem, 27(3) Berkeley Technology Law Journal 1285-1346 (2012) Link
McDonald, Diane; with Ursula KellyThe Value and Benefit of Text Mining to UK Further and Higher Education2012Key findings
We found some significant use of text mining in fields such as biomedical sciences and chemistry and some early adoption within the social sciences and humanities. Current UK copyright restrictions, however, mean that most text mining in UKFHE for non-commercial research is based on Open Access documents or bespoke arrangements. This means that the availability of material for text mining is limited.
The costs of text mining relate to access rights to text-minable materials, transaction costs (participation in text mining), entry (setting up text mining), staff and underlying infrastructure. Currently, the most significant costs are transaction costs and entry costs. Given the sophisticated technical nature of text mining, entry costs will by and large remain high. Current high transaction costs are attributable to the need to negotiate a maze of licensing agreements covering the collections researchers wish to study.
We undertook a number of case studies to explore the economic value and benefits of text mining to UKFHE. Due to the limited uptake of text mining and legal and commercial restrictions, we adopted a stylised approach, focusing on specific small-scale illustrations of the value and benefits of text mining, and the wider potential value and benefits that could be delivered if technical and legal limitations were resolved. Benefits include: increased researcher efficiency; unlocking hidden information and developing new knowledge; exploring new horizons; improved research and evidence base; and improving the research process and quality. Broader economic and societal benefits include cost savings and productivity gains, innovative new service development, new business models and new medical treatments.
The Hargreaves review suggested that non-commercial text mining could bring savings and wider innovation potential to UKFHE. The existing legal restrictions on text mining meant that it proved very difficult within the course of this study to source sufficiently robust data to systematically quantify these potential benefits. However, the evidence gathered illustrates that there is clear potential for significant productivity gains, with benefit both to the sector and to the wider economy.
Legal uncertainty, inaccessible information silos, lack of information and lack of a critical mass are barriers to text mining within UKFHE. While the latter two can be addressed through campaigns to inform and raise awareness, the former two are unlikely to be resolved without changes to the current licensing system and global adoption of interoperability standards.
Text mining presents an opportunity for the UK, encouraging innovation and growth through leveraging additional value from the public research base. The UK has a number of strengths that put it in a good position to be a key player in text mining development, including good framework conditions for innovation and the natural advantage of its native language. The scholarly publishing market is global, predominantly in English, with global potential for demand for text mining tools and services. This offers opportunities for new service companies as well as current content providers. However, these opportunities are being hindered by a range of economic-related barriers including legal restrictions, high transaction costs and information deficit which is strongly indicative of market failure.
The technological developments underpinning text mining are relatively recent and hence were not envisaged in previous consideration of the impact of copyright. However, because the process of text mining involves the production and storage of copies of material that may be subject to copyright, there is a new conundrum: the market intervention of copyright – originally intended to protect creative producers – may be inhibiting new knowledge discovery and innovation.
Diane McDonald & Ursula Kelly, The Value and Benefit of Text Mining to UK Further and Higher Education, JISC (2012)Link
Noll, Rob van der; with Stef van Gompel, Lucie Guibault and Jarst WedaFlexible Copyright The Law and Economics of Introducing an Open Norm in the Netherlands2012This study analyses the law and economics of introducing flexibility in the system of exceptions and limitations in Dutch copyright law. Flexibility would exist in an open norm, on the basis of which the courts can decide whether certain uses of copyrighted material are permissible or not, instead of explicitly defining this in the law. The report assesses problem areas where the lack of flexibility creates legal disputes and potential barriers to innovation and production. The core of the study concerns the analysis of the economic rationale and effects of introducing flexibility in the Dutch legal order in the form of an open norm. The study was commissioned by the Dutch Ministry of Economic Affairs, Agriculture & Innovation and carried out by a consortium of SEO Economic Research and the Institute for Information Law (IViR) at the University of Amsterdam. The authors thank Prof. Bernt Hugenholtz for his useful comments on a draft version of the report.Rob van der Noll, Stef van Gompel, Lucie Guibault, and Jarst Weda. Flexible Copyright The Law and Economics of Introducing an Open Norm in the Netherlands. SEO Economic Research (2012).Link
Reichman, Jerome; with Ruth OkedijiWhen Copyright Law and Science Collide: Empowering Digitally Integrated Research Methods on a Global Scale2012Automated knowledge discovery tools have become central to the scientific enterprise in a growing number of fields and are widely employed in the humanities as well. New scientific methods, and the evolution of entirely new fields of scientific inquiry, have emerged from the integration of digital technologies into scientific research processes that ingest vast amounts of published data and literature. The Article demonstrates that intellectual property laws have not kept pace with these phenomena.
Copyright law and science co-existed for much of their respective histories, with a benign tradition of the former giving way to the needs of the latter. Today, however, the formidable array of legislative maneuvers to tighten the grip of copyright laws in defense of cultural industries whose business models were upended in the online environment have, deliberately or not, undermined the ability of the scientific community to access, use, and reuse vast amounts of basic knowledge inputs. Database protection laws, reinforced by electronic fences and contracts of adhesion, further subject copy-reliant technologies to the whims of publishers and hinder the pooling of publicly funded resources that empower collaborative research networks and the formation of science commons in general.
The authors analyze the different components of a complicated transnational legislative fabric that have changed world copyright law into a science-hostile environment. Given the global nature of digital scientific research, they focus attention on comparative laws that fragment research inputs into diversely accessible territorial compartments. This analysis shows that users of automated knowledge discovery tools will likely become collective infringers of both domestic and international property laws.
In response to this challenge, the authors discuss possible solutions to the problems that intellectual property laws have created for digitally integrated scientific research from two very different angles. First, the authors skeptically consider the kinds of legal reforms that would be needed if commercial publishers continued to act as intermediaries between producers and users of scientific information and data, as they do today, without regard to the likelihood that such reforms would ever be enacted.
The authors then reconsider the role of publishers and ask whether, from a cost-benefit perspective, it should be significantly modified or abandoned altogether. Finally, the authors examine alternative strategies that the scientific community itself could embrace in a concerted effort to manage its own upstream knowledge assets in ways that might avoid, or at least attenuate, the obstacles to digitally empowered scientific research currently flowing from a flawed intellectual property regime. The Article concludes by stressing the need to bridge the current disconnect between private rights and public science, in the overall interest of both innovation and the advancement of knowledge.
Jerome H. Reichman & Ruth L. Okediji, When Copyright Law and Science Collide: Empowering Digitally Integrated Research Methods on a Global Scale, 96 Minn. L. Rev. 1362, 1362–2182 (2012)Link
Sag, MatthewOrphan Works as Grist for the Data Mill2012The phenomenon of library digitization in general, and the digitization of so-called ‘orphan works’ in particular, raises many important copyright law questions. However, as this article explains, correctly understood, there is no orphan works problem for certain kinds of library digitization.
The distinction between expressive and nonexpressive works is already well recognized in copyright law as the gatekeeper to copyright protection - novels are protected by copyright, telephone books and other uncreative compilations of data are not. The same distinction should generally be made in relation to potential acts of infringement. Preserving the functional force of the idea - expression distinction in the digital context requires that copying for purely nonexpressive purposes (also referred to as non-consumptive use), such as the automated extraction of data, should not be regarded as infringing.
The nonexpressive use of copyrighted works has tremendous potential social value: it makes search engines possible, it provides an important data source for research in computational linguistics, automated translation and natural language processing. And increasingly, the macro-analysis of text is being used in fields such as the study of literature itself. So long as digitization is confined to data processing applications that do not result in infringing expressive or consumptive uses of individual works, there is no orphan works problem because the exclusive rights of the copyright owner are limited to the expressive elements of their works and the expressive uses of their works.
Matthew Sag, Orphan Works as Grist for the Data Mill, 27 Berkeley Tech. L. J., 1503 (2012)Link
Axhamn, Johan; with Lucie GuibaultSolving Europeana’s Mass-Digitization Issues Through Extended Collective Licensing?2011To the extent that books, photographs and other items in the collections of libraries and other cultural institutions are protected by copyright, they cannot be disseminated online without prior permission from the copyright owners. This applies to most items created during the 20th century. However, the transaction costs of finding and negotiating a license with every copyright owner would rise to paramount levels. In cases where it is impossible to identify or locate the right holder ("orphan work"), the items cannot be made available at all. These challenges hamper the development of initiatives aiming at digitizing and making available online the collections of cultural institutions and services like Europeana. A way forward is the Nordic extended collective licensing (ECL) model. The model extends an agreement between a Collective Management Organization (CMO) and a user also non-members of the organization. In this way many of the transaction costs can be drastically reduced. The paper analyses the pros and cons of an ECL cross-border model in relation to the digitization and online dissemination of the collections held by national cultural institutions. Axhamn, Johan and Guibault, L., Solving Europeana’s Mass-Digitization Issues Through Extended Collective Licensing? (December 20, 2011). Nordic Intellectual Property Law Review, 2011 (6) p. 509 et seq. ,Link
Borghi, Mauizio ; with Stavroula KarapapaNon-Display Uses of Copyright Works: Google Books and Beyond2011With the advent of mass digitization projects, such as the Google Book Search, a peculiar shift has occurred in the way that copyright works are dealt with. Contrary to what has so far been the case, works are turned into machine-readable data to be automatically processed for various purposes without the expression of works being displayed to the public. In the Google Book Settlement Agreement, this new kind of usage is referred to as ‘non-display uses’ of digital works. The legitimacy of these uses has not yet been tested by Courts and does not comfortably fit in the current copyright doctrine, plainly because the works are not used as works but as something else, namely as data. Since non-display uses may prove to be a very lucrative market in the near future, with the potential to affect the way people use copyright works, we examine non-display uses under the prism of copyright principles to determine the boundaries of their legitimacy. Through this examination, we provide a categorization of the activities carried out under the heading of 'non-display uses,’ we examine their lawfulness under the current copyright doctrine and approach the phenomenon from the spectrum of data protection law that could apply, by analogy, to the use of copyright works as processable dataMaurizio Borghi & Stavroula Karapapa, Non-Display Uses of Copyright Works: Google Books and Beyond, 1 Queen Mary J. Intell. Prop. 21 (2011)Link
Guibault, LucieOwning the Right to Open Up Access to Scientific Publications2011Whether the researchers themselves, rather than the institution they work for, are at all in a position to implement OA principles actually depends on the initial allocation of rights on their works. Whereas most European Union Member States have legislation that provides that the copyright owner is the natural person who created the work, the copyright laws of a number European countries, including those of the Netherlands and the United Kingdom, establish a presumption, according to which the copyright of works made in the course of employment belongs initially to the employer, which in this case would be the university. In France, a similar presumption applies to works created by employees of the State. Even if researchers are in a position to exercise the rights on their works, they may, nevertheless, be required to transfer these to a publisher in order to get their article or book published. This paper, therefore, analyses the legal position of researchers, research institutions and publishers respectively, and considers what the consequences are for the promotion of OA publishing in light of the principles laid down in the Berlin Declaration and the use of Creative Commons licenses.Guibault, L., Owning the Right to Open Up Access to Scientific Publications (January 3, 2011). OPEN CONTENT LICENSING: FROM THEORY TO PRACTICE, L. Guibault and C. Angelopoulos, ed., Amsterdam University Press, 2011, Link
Han, Jiawei; with Micheline Kamber, and Jian PeiData Mining: Concept and Techniques2011The computerization of our society has substantially enhanced our capabilities for both generating and collecting data from diverse sources. A tremendous amount of data has flooded almost every aspect of our lives. This explosive growth in stored or transient data has generated an urgent need for new techniques and automated tools that can intelligently assist us in transforming the vast amounts of data into useful information and knowledge. This has led to the generation of a promising and flourishing frontier in computer science called data mining, and its various applications. Data mining, also popularly referred to as knowledge discovery from data (KDD), is the automated or con- venient extraction of patterns representing knowledge implicitly stored or captured in large databases, data warehouses, the Web, other massive information repositories, or data streams.
This book explores the concepts and techniques of knowledge discovery and data min- ing. As a multidisciplinary field, data mining draws on work from areas including statistics, machine learning, pattern recognition, database technology, information retrieval, network science, knowledge-based systems, artificial intelligence, high-performance computing, and data visualization. We focus on issues relating to the feasibility, use- fulness, effectiveness, and scalability of techniques for the discovery of patterns hidden in large data sets. As a result, this book is not intended as an introduction to statis- tics, machine learning, database systems, or other such areas, although we do provide some background knowledge to facilitate the reader’s comprehension of their respective roles in data mining. Rather, the book is a comprehensive introduction to data mining. It is useful for computing science students, application developers, and business professionals, as well as researchers involved in any of the disciplines previously listed.
Data mining emerged during the late 1980s, made great strides during the 1990s, and continues to flourish into the new millennium. This book presents an overall picture of the field, introducing interesting data mining techniques and systems and discussing applications and research directions. An important motivation for writing this book was the need to build an organized framework for the study of data mining—a challenging task, owing to the extensive multidisciplinary nature of this fast-developing field. We hope that this book will encourage people with different backgrounds and experiences to exchange their views regarding data mining so as to contribute toward the further promotion and shaping of this exciting and dynamic field.
Han, Jiawei, Micheline Kamber, and Jian Pei. Data Mining: Concept and Techniques (Morgan Kaufmann ed., 3d ed., 2011)Link
Manyia, James; with Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung ByersBig Data: The Next Frontier for Innovation, Competition and Productivity2011In itself, the sheer volume of data is a global phenomenon- but what does it mean? Many citizens around the world regard this collection of information with deep suspicion, seeing the data flood as nothing more than an intrusion of their privacy. But there is strong evidence that big data can play a significant economic role to the benefit not only of private commerce but also of national economies and their citizens. Our research finds that data can create significant value for the world economy, enhancing the productivity and competitiveness of companies and the public sector and creating substantial economic surplus for consumers. …
This report seeks to understand the state of digital data, how different domains can use large datasets to create value, the potential value across stakeholders, and the implications for the leaders of private sector companies and public sector organizations, as well as for policy makers. We have supplemented our analysis of big data as a whole with a detailed examination of five domains (health care in the United States, the public sector in Europe, retail in the United States, and manufacturing and personal location data globally). This research by no means represents the final word on big data; instead we see it as a beginning. We fully anticipate that this is a story that will continue to evolve as technologies and techniques using big data develop and data, their uses, and their economic benefits grow (alongside associated challenges and risks).
James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, Angela Hung Byers. Big Data: The Next Frontier for Innovation, Competition, and Productivity, McKinsey Global Institute (2011)Link
Ncube, Caroline; with Tobias SchonwetterNew Hope for Africa? Copyright and Access to Knowledge in the Digital Age2011This paper discusses the three main approaches countries around the world have taken towards copyright exceptions and limitations. After examining the advantages and disadvantages of the different approaches, it suggests a preferred model for developing countries. It then addresses the problematic interplay between copyright exceptions and limitations on the one hand and technological protection measures (TPMs) on the other. The paper concludes by offering a solution for mitigating the potentially detrimental impact of TPMs on otherwise permitted uses of copyright protected knowledge materials.Ncube, Caroline B. and Schonwetter, Tobias, New Hope for Africa? Copyright and Access to Knowledge in the Digital Age (2011). (2011) 13(3) Info 64-4, Link
Sag, MatthewThe Prehistory of Fair Use2011This article reconsiders the history of copyright’s pivotal fair use doctrine. The history of fair use does not in fact begin with early American cases such as Folsom v. Marsh in 1841, as most accounts assume - the complete history of the fair use doctrine begins with over a century of copyright litigation in the English courts. Reviewing this ‘pre-history’ of the American fair use doctrine leads to three significant conclusions. The first is that copyright and fair use evolved together. Virtually from its inception, statutory copyright went well beyond merely mechanical acts of reproduction and was defined by the concept of fair abridgment. The second insight gained by extending our historical view is that there is in fact substantial continuity between fair abridgment in the pre-modern era and fair use in the United States today. These findings have substantial implications for copyright law today, the principal one being that fair use is central to the formulation of copyright, and not a mere exception.
The third conclusion relates to the contribution of Folsom v. Marsh itself. The pre-modern cases illustrate a half-formed notion of the derivative right: unauthorized derivatives could be enjoined to defend the market of the original work, but they did not constitute a separate market unto themselves. Folsom departs from the earlier English cases in that it recognizes derivatives as inherently valuable, not just a thing to be enjoined to defend the original work against substitution. This subtle shift is important because while the boundaries of a defensive derivative right can be ascertained with respect to the effect of the defendant’s work on the plaintiff’s original market, the boundaries of an offensive derivative right can only be determined with reference to some other limiting principle. This extension of the derivative right may well have been inevitable. It seems likely that as more and more derivatives were enjoined defensively, courts and copyright owners began to see these derivatives as part of the author’s inherent rights in relation to his creation. In other words, once copyright owners were allowed to preclude derivatives to prevent competition with their original works, they quickly grew bold enough to assert an exclusive right in derivative works for their own sake. A development which, for good or ill, bridges the gap between pre-modern and modern copyright.
Matthew Sag, The Prehistory of Fair Use, 76 Brook. L. Rev. 1371 (2011)Link
Elkin-Koren, NivaFair Use Best Practices for Higher Education: The Israeli Experience2010The fair use doctrine may no longer facilitate the ultimate goal of copyright law, which is to promote production and dissemination of arts and sciences. The high degree of uncertainty stemming from the doctrine is creating a chilling effect and causing users to avoid exploiting the work in ways which the law seeks to encourage under fair use. To address this uncertainty and its chilling effect on educational use, we drafted a Code of Fair Use Best Practices for the use of copyright materials in Higher Education Institutions (hereinafter – HEI) in Israel. We formed a coalition of all the higher education institutions in Israel and negotiated a shared understanding of fair use among the partnering institutions.
This paper provides a snapshot of the process of building the coalition and drafting the Code of Fair Use Best Practices. The initiative was inspired by the visionary initiatives of Patricia Aufderheide and Peter Jaszi, who worked with various communities in the U.S. to devise particular codes of Fair Use Best Practices. We also carefully examined the lessons from the failure of past American projects, such as the CONFU. We thus had ample source material for a comparative analysis of copyright law, fair use, and the different strategies of legal activism for social change. We begin by describing our vision for the educational use of copyrighted materials; our view of the purpose of copyright and fair use doctrine; and our view of the interaction between law and social norms. In Part II, we analyze the legal regime that pertains to educational use of copyrighted materials in Israel. Part III describes the process of consensus building among the different stakeholders. In Part IV, we present the major principles of the Code and reflect on their implications for the development of Fair Use doctrine. The ongoing debate regarding the appropriate mechanism for defining permissible uses is often phrased as a choice between rules and standards. While specific exemptions would provide a high level of certainty, they may prove to be too narrow and rigid and would not facilitate adaption to changes in the economic, social and technological environments. Standard would provide flexibility but too little certainty, as courts would have sole discretion in holding, retroactively, whether a use was fair. The emerging communities that deliberate on fair use in a contextual manner offer a third way. Fair use, like ethical dilemmas, involves deliberation. If we develop social institutions to facilitate such deliberation, we may bridge a gap between legal standards and social norms, and may enrich the fair use analysis with the contextual meaning it deserves. A process of consensus building fits well with this insight, since consensus building reflects an attempt to create a community with shared language that will be able to develop an ethical praxis, step by step.
Niva Elkin Koren, Fair Use Best Practices for Higher Education: The Israeli Experience, J. Copyright Soc’t USA, at 1-26, (2010),Link
Geiger, ChristopheThe Future of Copyright in Europe: Striking a Fair Balance between Protection and Access to Information2010The starting point for this report was the Council of Europe Parliamentary Assembly’s motion for a resolution of 24 April 2007 (Doc. 11272)1. This motion focuses on two rights presented as competing with one another - copyright and the right of access to information - and emphasises the need to take them into account in the new digital environment. This clearly shows the Council’s desire to analyse the issues in a broader framework and reject the prioritisation of different rights, which is very much in line with an approach based on striking a balance between fundamental rights.
This report emphasises the need not to think in terms of opposing rights but of the complementary nature of copyright and the right of access to information so as to reconcile the two, which is both necessary and desirable. Copyright law in fact has no alternative but to include access to information in order to meet the challenges posed by the knowledge society. It is its ability to bring together opposing but complementary views that will testify to its durability in the future and to whether it can adapt to a new economic, technological and social environment. Copyright law has shown a remarkable ability to adapt to new developments in the past and has the necessary tools to ensure that this continues to be the case in the future.
Accordingly, it will be necessary, first of all, to reiterate a number of basic principles of copyright law and carry out a brief historical survey. A study will then need to be carried out of how the advent of the information society has changed the existing balances. This will be followed by a brief discussion of recent developments in the legal provisions currently in force. This in turn would lead us to consider both the changes necessary to those provisions to ensure better access to information as well as certain initiatives that are either under way or planned, with the aim of striking a balance between the interests involved.
Christophe Geiger, The Future of Copyright in Europe: Striking a Fair Balance between Protection and Access to Information, Report for the Committee on Culture, Science and Education – Parliamentary Assembly, Council of Europe (2009) (extended version published in English: 14 Intell. Prop. Quarterly 1–14 (2010))Link
Lee, EdwardTechnological2010The Article proposes a framework tailoring fair use specifically for technology cases. At the inception of the twenty-first century, information technologies have become increasingly central to the U.S. economy. Not surprisingly, complex copyright cases involving speech technologies, such as DVRs, mp3 devices, Google Book Search, and YouTube, have increased as well. Yet existing copyright law, developed long before digital technologies, is ill-prepared to handle the complexities these technology cases pose. The key question often turns, not on prima facie infringement, but on the defense of fair use, which courts have too often relegated to extremely fact-specific decisions. The downside to this ad hoc adjudication of fair use is that it leads to great uncertainty over what is permissible in a way that may retard innovation in speech technologies. This Article addresses this ongoing problem by offering a proposal for courts to recognize a specific type of fair use - technological fair use - and to tailor the four fair use factors accordingly. Technological fair use is supported not only by a synthesis of existing case law, but also, more importantly, by the constitutional underpinnings of the First Amendment and the Copyright and Patent Clause, as well as economic theory.Edward Lee, Technological Fair Use, 83 S. Cal. L. Rev. 797 (2010), Link
Sag, MatthewThe Google Book Settlement and the Fair Use Counterfactual2010The sprawling Google Book litigation began as a dispute between the search engine colossus and a variety of authors and publishers over the legality of book digitization for the purpose of indexing paper collections and making them searchable. However, through the metamorphic power of class-action litigation, a dispute over mere indexing and search has been transformed into a comprehensive agreement over the future of the book as a digital commodity. Understanding this transformation and its implications is the central ambition of this Article. It does so by comparing the pending (now amended) Google Book settlement to the most likely outcome of the litigation the settlement resolves. This counterfactual provides a useful benchmark by which to assess the effects, and thus the merits, of the Google Book Search settlement.
The Settlement differs from the predicted fair use ruling in four critical areas. First, the Settlement permits Google to engage in a significant range of uses, including the complete electronic distribution of books that go well beyond fair use. Second, the Settlement provides for initial cash payments by Google to the copyright owners and a fairly generous revenue sharing agreement, neither of which would have been required under a fair use ruling. Third, the agreement creates a new set of institutional arrangements that will govern the relationship between Google and the copyright owners covered by the Settlement. The foundations of this new institutional framework are the Settlement agreement itself, the creation of a collective rights management organization called the “Book Rights Registry” and the “Author-Publisher Procedures.” The fourth area in which the Settlement differs from the likely fair use outcome relates to the accessibility, commoditization, and control of orphan works.
Sag, Matthew, The Google Book Settlement and the Fair Use Counterfactual (October 9, 2010). New York Law School Law Review, Vol. 55, 2010, The DePaul University College of Law, Technology, Law & Culture Research Series Paper No. 10-001Link
Samuelson, PamelaGoogle Book Search and the Future of Books in Cyberspace2010The Google Book Search (GBS) initiative once promised to test the bounds of fair use, as the company started scanning millions of in-copyright books from the collections of major research libraries. The initial goal of this scanning was to make indexes of the books’ contents and to provide short snippets of book contents in response to pertinent search queries. The Authors Guild and five trade publishers sued Google in the fall of 2005 charging that this scanning activity was copyright infringement. Google defended by claiming fair use. Rather than litigating this important issue, however, the parties devised a radical plan to restructure the market for digital books, which was announced on October 28, 2008, by means of a class action settlement of the lawsuits. Approval of this settlement would give Google – and Google alone – a license to commercialize all out-of-print books and to make up to 20 per cent of their contents available in response to search queries (unless rights holders expressly forbade this).
This article discusses the glowingly optimistic predictions about the future of books in cyberspace promulgated by proponents of the GBS settlement and contrasts them with six categories of serious reservations that have emerged about the settlement. These more pessimistic views of GBS are reflected in the hundreds objections and numerous amicus curiae briefs filed with the court responsible for determining whether to approve the settlement. GBS poses risks for publishers, academic authors and libraries, professional writers, and readers as well as for competition and innovation in several markets and for the cultural ecology of knowledge. Serious concerns have also been expressed about the GBS settlement as an abuse of the class action process because it usurps legislative prerogatives. The article considers what might happen to the future of books in cyberspace if the GBS deal is not approved and recommends that regardless of whether the GBS settlement is approved, a consortium of research libraries ought to develop a digital database of books from their collections that would enhance access to books without posing the many risks to the public interest that the GBS deal has created
Pamela Samuelson, Google Book Search and the Future of Books in Cyberspace, 94 Minn. L. Rev. 1308 (2010)Link
Samuelson, PamelaAcademic Author Objections to the Google Book Search Settlement?2010This Article explains the genesis of the Google Book Search (GBS) project and the copyright infringement lawsuit challenging it that the litigants now wish to settle with a comprehensive restructuring of the market for digital books. At first blush, the settlement seems to be a win-win-win, as it will make millions of books more available to the public, result in new streams of revenues for authors and publishers, and give Google a chance to recoup its investment in scanning millions of books. Notwithstanding these benefits, a closer examination of the fine details of the proposed GBS settlement should give academic authors some pause. The interests of academic authors were not adequately represented during the negotiations that yielded the proposed settlement. Especially troublesome are provisions in the proposed settlement are the lack of meaningful constraints on the pricing of institutional subscriptions and the plan for disposing of revenues derived from the commercialization of “orphan” and other unclaimed books. The Article also raises concerns about whether the parties’ professed aspirations for GBS to be a universal digital library are being undermined by their own withdrawals of books from the regime the settlement would establish. Finally, the Article suggests changes that should be made to the proposed settlement to make it fair, reasonable, and adequate to the academic authors whose works make up a substantial proportion of the GBS corpus. Even with these modifications, however, there are serious questions about whether the class defined in the PASA can be certified consistent with Rule 23, whether the settlement is otherwise compliant with Rule 23, whether the settlement is consistent with the antitrust laws, and whether approval of this settlement is an appropriate exercise of judicial power.Pamela  Samuelson, Academic Author Objections to the Google Book Search Settlement?, J. Telecomm. & High Tech. L. (2010)Link
Weiss, Sholom ; with Nitin Indurkhya, and Tong ZhangFundamentals of Predictive Text Mining2010This successful textbook on predictive text mining offers a unified perspective on a rapidly evolving field, integrating topics spanning the varied disciplines of data science, machine learning, databases, and computational linguistics. Serving also as a practical guide, this unique book provides helpful advice illustrated by examples and case studies. This highly anticipated second edition has been thoroughly revised and expanded with new material on deep learning, graph models, mining social media, errors and pitfalls in big data evaluation, Twitter sentiment analysis, and dependency parsing discussion. The fully updated content also features in-depth discussions on issues of document classification, information retrieval, clustering and organizing documents, information extraction, web-based data-sourcing, and prediction and evaluation. Features: includes chapter summaries and exercises; explores the application of each method; provides several case studies; contains links to free text-mining software.Sholom M. Weiss, Nitin Indurkhya, and Tong Zhang. Fundamentals of Predictive Text Mining 15 (David Gries & Fred B. Schneider eds., 2010). , Fundamentals of Predictive Text Mining 15 (David Gries & Fred B. Schneider eds., 2010). Link
Geiger, ChristopheCopyright’s Fundamental Rights Dimension at EU Level2009Christophe Geiger, Copyright’s Fundamental Rights Dimension at EU Level, in Research Handbook on the Future of EU Copyright 27 (Estelle Derclaye ed., 2009), Link
Sag, MatthewCopyright and Copy-Reliant Technology2009This article studies the rise of copy-reliant technologies - technologies such as Internet search engines and plagiarism detection software that, although they do not read, understand or enjoy copyrighted works, necessarily copy them in large quantities. This article provides a unifying theoretical framework for the legal analysis of topics that tend to be viewed discretely. Search engines, plagiarism detection software, reverse engineering and Google's nascent library cataloging effort, are each part of a broader phenomenon brought about by digitization, that of copy-reliant technologies. These technologies raise two novel, yet central, questions of copyright law. First, whether a non-expressive use that nonetheless requires copying the entirety of a copyright work should be found to infringe the exclusive rights of the copyright owner. Second, whether the transaction costs associated with copy-reliant technologies justify switching copyright's default rule that no copying may take place without permission to one in which copyright owners must affirmatively opt-out of specific uses of their works.
This article explores the pivotal role of the fair use doctrine in adapting copyright law to new technology, and explains the role of expressive substitution in fair use doctrine generally and the application of fair use in the context of non-expressive use in particular. Furthermore, this Article explores the application of fair use in situations where the alleged infringer has provided copyright owners with the ability to opt-out. The Article is timely in light of the pending Google Book Settlement.
Matthew Sag, Copyright and Copy-Reliant Technology, 103 Nw. U.L. Rev. 1607, 1682 (2009).Link
Schonwetter, Tobias; with Jeremy de Beer, Dick Kawooya, Achal Prabhala Copyright and Education: Lessons on African Copyright and Access to Knowledge2009The African Copyright and Access to Knowledge (ACA2K) project is a pan-African research network of academics and researchers from law, economics and the information sciences, spanning Egypt, Ghana, Kenya, Morocco, Mozambique, Senegal, South Africa and Uganda. Research conducted by the project was designed to investigate the extent to which copyright is fulfilling its objective of facilitating access to knowledge, and learning materials in particular, in the study countries. The hypotheses tested during the course of research were that: (a) the copyright environments in study countries are not maximising access to learning materials, and (b) the copyright environments in study countries can be changed to increase access to learning materials. The hypotheses were tested through both doctrinal legal analysis and qualitative interview-based analysis of practices and perceptions among relevant stakeholders. This paper is a comparative review of some of the key findings across the eight countries.
An analysis of the legal research findings in the study countries indicates that national copyright laws in all eight ACA2K study countries provide strong protection, in many cases exceeding the terms of minimum protection demanded by international obligations. Copyright limitations and exceptions to facilitate access to learning materials are not utilised as effectively as they could be, particularly relating to the digital environment. Distance learning, the needs of disabled people, the needs of students, teachers, educational institutions, libraries and archives are inadequately addressed. To the extent that copyright laws address the Internet and other information and communication technologies (ICTs), they do so primarily in a manner that further restricts access to learning materials. In summary, national copyright frameworks in the study countries are not geared for maximal access to learning materials, and are in need of urgent attention.
An analysis of qualitative research findings, gathered from the field in stakeholder interviews, suggests that a substantial gap exists between copyright law and copyright practice in each country studied. Many users who are aware of the concept of copyright are unable or unwilling to comply with it or to work within the user rights it offers because of their socioeconomic circumstances. In everyday practice, with respect to learning materials, vast numbers of people act outside legal copyright structures altogether, engaging (knowingly or unknowingly) in infringing practices in order to gain the access they need to learning materials.
In conclusion, evidence from the ACA2K project suggests that the copyright environments in the study countries can and must be improved by reforms that will render the copyright regimes more suitable to local developing country realities. Without such reform, equitable and non-infringing access to learning materials will remain an elusive goal in these countries.
Schonwetter, Tobias and de Beer, Jeremy and Kawooya, Dick and Prabhala, Achal, Copyright and Education: Lessons on African Copyright and Access to Knowledge (2009). J. de Beer, T. Schonwetter, D. Kawooya & A. Prabhala, “Copyright and Education: Lessons on African Copyright and Access to Knowledge” (2009-2010) 10 The African Journal of Information and Communications 37-52.Link
Carroll, MichaelCreative Commons as Conversational Copyright2007Copyright law's default settings inhibit sharing and adaptation of creative works even though new digital technologies greatly enhance individuals' capacity to engage in creative conversation. Creative Commons licenses enable a form of conversational copyright through which creators share their works, primarily over the Internet, while asserting some limitation on user's right with respect to works in the licensed commons. More specifically, this chapter explains the problems in copyright law to which Creative Commons licenses respond, the methods chosen, and why the machine-readable and public aspects of the licenses are specific examples of a more general phenomenon in digital copyright law that will grow in importance in the coming years.Carroll, Michael W., Creative Commons as Conversational Copyright, Villanova Law/Public Policy Research Paper No. 2007-8, INTELLECTUAL PROPERTY AND INFORMATION WEALTH: ISSUES AND PRACTICES IN THE DIGITAL AGE, Peter K. Yu, ed., Vol. 1, pp. 445-61, Praeger, 2007Link
Carroll, MichaelThe Movement for Open Access Law2006My claim in this symposium contribution is that the law and legal scholarship should be freely available on the Internet and that copyright law and copyright licensing practices should facilitate achievement of this goal. This claim reflects the combined aims of those who support the movement for open access law. This nascent movement is a natural extension of the well-developed movement for free access to primary legal materials and the equally well-developed open access movement, which seeks to make all scholarly journal articles freely available on the Internet. Legal scholars have only general familiarity with the first movement and very little familiarity with the second. In this contribution, I demonstrate the linkages between these movements and briefly outline the argument for open access law.Carroll, Michael W., The Movement for Open Access Law - Symposium. Lewis & Clark Law Review, Vol. 10, 2006, Villanova Law/Public Policy Research Paper No. 2006-11,Link
Samuelson, PamelaThe Story of Baker v. Selden: Sharpening the Distinction between Authorship and Invention2005This Story grows out of a study of the Supreme Court Record and other historical materials about the well-known 1880 copyright case of Baker v. Selden. Among the surprises the Story reveals are that Selden was not, as some have surmised, the author of a treatise on bookkeeping, nor was he the inventor of the now universally used T-account system of bookkeeping. Selden's books are better described as minor variants on one another, consisting of 20-some pages of bookkeeping forms with sample entries, a short preface, and an introduction. Most of the 650 words of text in the last book puff the merits of his system rather than explaining how to use it. Baker, not Selden, is mentioned in works on the history of bookkeeping, and Baker's books on bookkeeping (but not Selden's) are still available in various public and university libraries. Though burdened with thousands of dollars of debt, Selden's widow hired a prominent intellectual property lawyer to represent her in the lawsuit against Baker which charged him with pirating the Selden system. She believed she was owed damages (in today's dollars) of a quarter-million dollars a year from Baker and his customers. Baker probably lost at the trial court level because he hired an inexperienced young lawyer; Baker won before the Supreme Court in part because he was represented by a team of supple heavy-hitters.
The most important lesson of this Story concerns the legal principle the Court was trying to promulgate. Although Baker v. Selden is widely cited as the genesis of the "idea/expression" distinction in copyright law, the Story shows that this distinction predated Baker. Nor is Baker the genesis of the "merger" doctrine (which holds that if an idea can only be expressed in one or a small number of ways, copyright law will not protect the expression because it has "merged" with the idea). The main objective of the Supreme Court's decision was to sharpen the distinction between authorship and invention. The complaint spoke of Selden as the author and inventor of several books and of a bookkeeping system. His lawyer kept speaking about its novelty in the state of the art. Selden's widow claimed exclusive rights not only to stop Baker from publishing competing books, but also to collect damages from all of Baker's customers for their use of the infringing system. That Selden had sought, but apparently not obtained, a patent on his bookkeeping system seems to have affected the Court.
To clarify the proper roles of patent and copyright in protecting the fruits of intellectual labor, the Baker opinion introduced a new framework for analyzing copyright claims. It directed courts to consider whether the defendant had copied the author's description, explanation, illustration, or depiction of a useful art (such as a bookkeeping system) or ideas, or had only copied the useful art or ideas themselves. In the absence of a patent, the useful art depicted in a work, along with its ideas, could be used and copied by anyone, even in directly competing works. Any necessary incidents to implementing the art (e.g., blank forms illustrating use of the system) could likewise be used and copied by second comers without fear of copyright liability.
The Baker opinion's rich analysis of the roles of copyright and patent in protecting intellectual creations has, over the past 125 years, spawned at least eight significant copyright doctrines, including four codified in the Copyright Act of 1976, as well as a few enduring controversies.
Pamela Samuelson, The Story of Baker v. Selden: Sharpening the Distinction between Authorship and Invention, in Intellectual Property Stories 159–193 (Jane C. Ginsburg and Rochelle C. Dreyfuss eds., 2005) Link
Elkin-Koren, NivaLet the Crawlers Crawl: On Virtual Gatekeepers and the Right to Exclude Indexing2001Niva Elkin Koren, Let the Crawlers Crawl: On Virtual Gatekeepers and the Right to Exclude Indexing, 26 U. Dayton L. Rev. 179 (2001). Link
Karjala, DennisCopyright Protection Of Computer Program Structure1998Copyright plays an important role in the protection of computer software, but beyond the protection of program code, however, courts are extremely confused about the scope of copyright protection in a computer program. This judicial confusion is nicely exemplified by the Second Circuit's 1997 decision in Softel, Inc. v. Dragon Medical & Scientific Communications, Inc. Although the Second Circuit, in Computer Associates International, Inc. v. Altai, had earlier established the most widely accepted test for separating protected from unprotected elements in computer programs, the Softel panel, however, understood neither the correct technical application of the Computer Associates test nor its implicit underlying policy basis. As a result, the case resorted to metaphysics to determine what is protected expression in a computer program and what is not. Indeed, much of the language of the Softel opinion harkens back to the approach of the Third Circuit in Whelan Associates, Inc. v. Jaslow Dental Laboratory, Inc., which was expressly rejected in Computer Associates. Softel is not a step forward but a retrogression. This Comment discusses the facts of Softel, articulates a test for separating protected expression from protected idea in a computer program, and applies that test to program structure, sequence, and organization, and then returns to the Softel decision to illustrate how and where the Second Circuit panel in that case lost its bearings and to speculate briefly on the implications of the Softel decision for the future. It argues that copyright should protect computer program code from verbatim copying or slavish mechanical or electronic translations. Other program elements, such as structure, sequence, and organization and elements of software interfaces, should not be protected by the copyright in the program code, that is, by the computer program copyright.Dennis S. Karjala, Copyright Protection Of Computer Program Structure, 64 Brook. L. Rev. 519, 532 (1998),Link
Reichman, Jerome; with Pamela SamuelsonIntellectual Property Rights on Data?1997[Summary] The international intellectual property system founded on the Paris and Berne Conventions in the late nineteenth century has been dominated by the patent and copyright paradigms, which articulate the legal protection of technological inventions and of literary and artistic works, respectively. ... Data compilers in the United States and the United Kingdom had, in the past, experienced some success in protecting their investments in publicly distributed compilations by means of copyright law. ... The copyright laws of most developed countries exclude functionally determined databases and do not protect disparate data even when a given compilation as a whole happens to satisfy the eligibility requirements of those laws. ... To be sure, data providers, including members of the scientific community, could decide not to exercise proprietary rights in certain databases, for example, those funded by government agencies. ... For example, if the data extracted by the user are the data responsive to his or her online query, one can always argue that the extraction was qualitatively substantial. ... The possibility therefore exists that publishers may assert the right to control uses of noncopyrightable components of databases that would otherwise have been subsumed within the general right to use the same database had it qualified for copyright protection. ... As applied to traditional scientific works covered by copyright law, such an exception made sense because only the author's individual style was protected, and not his or her data, findings, or ideas.Jerome H. Reichman & Pamela Samuelson, Intellectual Property Rights on Data?, 50 Vand. L. Rev. 51 (1997).Link
Litman, JessicaThe Exclusive Right to Read1994Jessica Litman, The Exclusive Right to Read, 13 Cardozo Arts & Ent. L.J. 29 (1994)Link
Samuelson, PamelaCONTU Revisited: The Case Against Copyright Protection for Computer Programs in Machine-Readable Form1984Professor Samuelson casts a critical eye on the Final Report of the National Commission on New Technological Uses of Copyrighted Works (CONTU) which recommended that copyright protection be extended to machine-readable versions of computer programs. CONTU appears to have misunderstood computer technology and misinterpreted copyright tradition in two significant respects. The Commission failed to take into account the historical importance of disclosure of the contents of protected works as a fundamental goal of both the copyright and patent laws. It also erroneously opined that the utilitarian character of a work was no bar to its copyrightability when both the statute and the case law make clear that utilitarian works are not copyrightable. Since computer programs in machine-readable form do not disclose their contents and are inherently utilitarian, copyright protection for them is inappropriate. Congress acted on CONTU's recommendation without understanding the significance of these conceptual flaws. Professor Samuelson recommends the creation of a new form of intellectual property law specfically designed for machine-readable programs.Pamela Samuelson, CONTU Revisited: The Case Against Copyright Protection for Computer Programs in Machine-Readable Form, 1984 Duke L.J. 663 (1984).Link
Caso, CasoThe Darkest Hour: Private Information Control and the End of Democratic Science2018The evaluation of scientific research is based on data protected by secrecy and intellectual property (e.g., Elsevier Scopus or Clarivate Web of Science). The peer review process is essentially anonymous. While science has progressed thanks to public dialogue, the current evaluation system is centered on private control of information. This represents a fundamental shift from democratic to authoritarian science. Open Science may confront this change only if it is accepted as the heir, in the digital age, of the values and principles that public and democratic science has traditionally fostered in the age of printing, thus becoming the guardian of a democratic society.Roberto Caso, The Darkest Hour: Private Information Control and the End of Democratic Science (June 2, 2018). Trento Law and Technology Research Group Research Papers; nr. 35, http://dx.doi.org/10.2139/ssrn.3189519, in I. De Gennaro, H. Hofmeister, R. Lüfter (eds.), Academic Freedom in the European Context. Legal, Philosophical and Institutional Perspectives, in Palgrave Critical University Studies book series (PCU), Springer Nature, 2022, 259-288.Link
Caso, Roberto; with Rossana DucatoIntellectual Property, Open Science and Research Biobanks2014In biomedical research and translational medicine, the ancient war between exclusivity (private control over information) and access to information is proposing again on a new battlefield: research biobanks. The latter are becoming increasingly important (one of the ten ideas changing the world, according to Time magazine) since they allow to collect, store and distribute in a secure and professional way a critical mass of human biological samples for research purposes. Tissues and related data are fundamental for the development of the biomedical research and the emerging field of translational medicine: they represent the “raw material” for every kind of biomedical study. For this reason, it is crucial to understand the boundaries of Intellectual Property (IP) in this prickly context. In fact, both data sharing and collaborative research have become an imperative in contemporary open science, whose development depends inextricably on: the opportunities to access and use data, the possibility of sharing practices between communities, the cross-checking of information and results and, chiefly, interactions with experts in different fields of knowledge. Data sharing allows both to spread the costs of analytical results that researchers cannot achieve working individually and, if properly managed, to avoid the duplication of research. These advantages are crucial: access to a common pool of pre-competitive data and the possibility to endorse follow-on research projects are fundamental for the progress of biomedicine. This is why the "open movement" is also spreading in the biobank's field. After an overview of the complex interactions among the different stakeholders involved in the process of information and data production, as well as of the main obstacles to the promotion of data sharing (i.e., the appropriability of biological samples and information, the privacy of participants, the lack of interoperability), we will firstly clarify some blurring in language, in particular concerning concepts often mixed up, such as “open source” and “open access”. The aim is to understand whether and to what extent we can apply these concepts to the biomedical field. Afterwards, adopting a comparative perspective, we will analyze the main features of the open models - in particular, the Open Research Data model - which have been proposed in literature for the promotion of data sharing in the field of research biobanks. After such an analysis, we will suggest some recommendations in order to rebalance the clash between exclusivity - the paradigm characterizing the evolution of intellectual property over the last three centuries - and the actual needs for access to knowledge. We argue that the key factor in this balance may come from the right interaction between IP, social norms and contracts. In particular, we need to combine the incentives and the reward mechanisms characterizing scientific communities with data sharing imperative.Roberto Caso, Rossana Ducato, Intellectual Property, Open Science and Research Biobanks (October 17, 2014). Trento Law and Technology Research Group Research Paper No. 22, http://dx.doi.org/10.2139/ssrn.2511602, in M. Macilotti, U. Izzo, G. Pascuzzi (eds.), Comparative Issues in the Governance of Research Biobanks: Property, Privacy, Intellectual Property and the Role of Technology, Springer, Berlin Heidelberg, 2013, 209- 229.Link
Caso, R.; with G. DoreAcademic Copyright, Open Access and the “Moral” Second Publication Right2021The Green route to Open Access (OA), meaning the re-publication in OA venues of previously published works, can essentially be executed by contract and by copyright law. In theory, rights retention and contracts may allow authors to re-publish and communicate their works to the public, by means of license to publish agreements or specific addenda to copyright transfer agreements. But as a matter of fact, because authors lack bargaining power, they usually transfer all economic copyrights to publishers. Legislation, which overcomes the constraints of a contractual scheme where authors usually have less bargaining power, may deliver a (digital) second publication or communication right, which this paper discusses in the context of research publications. Outlining the historical and philosophical roots of the secondary publication right, the paper provocatively suggests that it has a “moral” nature that even makes it a shield for academic freedom as well as a major step forward in the overall development of OA.R. Caso, G. Dore, Academic Copyright, Open Access and the “Moral” Second Publication Right, Trento Law and Technology Research Group Research Papers 47 (forthcoming in European Intellectual Property Review - EIPR), https://doi.org/10.5281/zenodo.5764841 .Link
Caso, RobertoThe Darkest Hour: Private Information Control and the End of Democratic Science2018The evaluation of scientific research is based on data protected by secrecy and intellectual property (e.g., Elsevier Scopus or Clarivate Web of Science). The peer review process is essentially anonymous. While science has progressed thanks to public dialogue, the current evaluation system is centered on private control of information. This represents a fundamental shift from democratic to authoritarian science. Open Science may confront this change only if it is accepted as the heir, in the digital age, of the values and principles that public and democratic science has traditionally fostered in the age of printing, thus becoming the guardian of a democratic society.Caso, Roberto, The Darkest Hour: Private Information Control and the End of Democratic Science (June 2, 2018). Trento Law and Technology Research Group Research Papers; nr. 35, http://dx.doi.org/10.2139/ssrn.3189519Link
Case, Roberto; with Rossana DucatoIntellectual Property, Open Science and Research Biobanks2014In biomedical research and translational medicine, the ancient war between exclusivity (private control over information) and access to information is proposing again on a new battlefield: research biobanks. The latter are becoming increasingly important (one of the ten ideas changing the world, according to Time magazine) since they allow to collect, store and distribute in a secure and professional way a critical mass of human biological samples for research purposes. Tissues and related data are fundamental for the development of the biomedical research and the emerging field of translational medicine: they represent the “raw material” for every kind of biomedical study. For this reason, it is crucial to understand the boundaries of Intellectual Property (IP) in this prickly context. In fact, both data sharing and collaborative research have become an imperative in contemporary open science, whose development depends inextricably on: the opportunities to access and use data, the possibility of sharing practices between communities, the cross-checking of information and results and, chiefly, interactions with experts in different fields of knowledge. Data sharing allows both to spread the costs of analytical results that researchers cannot achieve working individually and, if properly managed, to avoid the duplication of research. These advantages are crucial: access to a common pool of pre-competitive data and the possibility to endorse follow-on research projects are fundamental for the progress of biomedicine. This is why the "open movement" is also spreading in the biobank's field. After an overview of the complex interactions among the different stakeholders involved in the process of information and data production, as well as of the main obstacles to the promotion of data sharing (i.e., the appropriability of biological samples and information, the privacy of participants, the lack of interoperability), we will firstly clarify some blurring in language, in particular concerning concepts often mixed up, such as “open source” and “open access”. The aim is to understand whether and to what extent we can apply these concepts to the biomedical field. Afterwards, adopting a comparative perspective, we will analyze the main features of the open models - in particular, the Open Research Data model - which have been proposed in literature for the promotion of data sharing in the field of research biobanks. After such an analysis, we will suggest some recommendations in order to rebalance the clash between exclusivity - the paradigm characterizing the evolution of intellectual property over the last three centuries - and the actual needs for access to knowledge. We argue that the key factor in this balance may come from the right interaction between IP, social norms and contracts. In particular, we need to combine the incentives and the reward mechanisms characterizing scientific communities with data sharing imperative.Caso, Roberto and Ducato, Rossana, Intellectual Property, Open Science and Research Biobanks (October 17, 2014). Trento Law and Technology Research Group Research Paper No. 22, http://dx.doi.org/10.2139/ssrn.2511602
Link