Sean Flynn, Director, PIJIP
At this year’s Creative Commons (CC) Global Summit in Mexico City, a group of CC members “identified a set of common issues and values” on copyright and artificial intelligence. The ideas were published on the CC blog as a way “for further community discussion and to help CC and the global community navigate uncharted waters in the face of generative AI and its impact on the commons.” I reproduce the statement here with some comments from me to further the deliberation. My perspective is shaped from my leading of a PIJIP Project on The Right to Research in International Copyright, supported by Arcadia. I cite to some of the outputs of that work.
Statement: Making AI Work For Creators And The Commons | Comments |
Preliminary considerations 1. Recognizing that around the world the legal status of using copyrighted works to train generative AI systems raises many questions and that there are currently only a limited number of jurisdictions with relatively clear and workable legal frameworks for such uses . We see the need to establish a set of principles that address the position of creators, the people who build and use machine learning systems, and the commons, under this emerging technological paradigm. 2. Noting that there are calls from organized rights holders to address the issues raised by using copyrighted works to train generative AI models, including based on the principles of credit, consent and compensation. 3. Noting that the development and deployment of generative AI models can be capital intensive and therefore risks resembling (or exacerbating) the concentration of markets, technology and power in the hands of a small number of powerful entities for the purposes of profits are largely concentrated in the United States and China, and that currently most of the (speculative) value corresponds to these companies. 4. Noting further that while there are many benefits to everyone’s ability to take advantage of the global information commons, extracting value from the commons can also reinforce existing power imbalances and, in fact, may structurally resemble previous examples of colonialist accumulation. – Noting that this issue is especially urgent when it comes to the use of traditional knowledge materials as training data for AI models. -Pointing out that the development of generative AI reproduces patterns from the colonial era, with the countries of the Global South being consumers of algorithms and providers of data from the North. – Recognize that some social impacts and risks resulting from the emergence of generative AI technologies must be addressed through public regulations other than copyright, or by other means, such as the development of standards and technical norms. The concerns of private rights holders are just one of a number of social concerns that have emerged in response to the rise of AI. -Noting that the development of generative AI models offers new opportunities for creators, researchers, educators and other professionals working in the public interest, in addition to providing benefits to a wide range of activities in other sectors of society. Noting further that AI generative models are a tool that enables new forms of creation, and that history has shown that new technological capabilities will inevitably be incorporated into artistic creation and information production. | 1. Very few legal systems provide clarity on the issue of when copyrighted works can be used (a) to train generative AI tools, or (b) when the outputs of those tools may violate copyright law. The input side is most important for the CC community organized around tools to create and protect the openness of information commons free to use for any purpose. One necessary input to AI is text and data mining research, which uses computers to analyze digitized information. See Implementing User Rights for Research in the Field of Artificial Intelligence EIPR 2020. A recent Right to Research project publication in Science shows “a patchwork of copyright laws across jurisdictions limits where and how TDM research can occur.” It should be noted that TDM encompasses many applications unrelated to artificial intelligence generally or generative AI in particular, including in the physical and social sciences and humanities.See Impact of Research Exceptions on Scientific Output- Joan-Josep Vallbé 2. The calls for credit, consent and compensation from content owners begs the question — it assumes that copyright grants exclusivity on reproductions needed to train Gen AI on the input side. This is the question policy must resolve. See point 1. 3. It is unclear how capital intensive AI tools if the data used to train them is free to use. As the technology disseminates, the barriers to entry are lowering.There is a powerful machine learning project in Africa, for example, that is creating language translation tools. A key barrier is access to open text and data, not to computer hardware or programming know how. See Prof Vukosi Marivate: NLP and TDM in Africa. 4. It is not clear to me that a primary impact of permitting “extracting value from the commons” is to “reinforce existing power imbalances” which “reproduces patterns from the colonial era.” If you are an AI developer in the Global South, you need a massive free to use information commons. The situation of a local programmer is a bit like a local filmmaker. Try to license a clip from big content. They will not charge you a tenth of the fee in a developing country because you have a tenth of the budget. Global monopolies set global prices. Information commons are more, not less, important to the Southern creator. I am not staying exploitation of global information systems does not happen or that Big Tech does not benefit or that they should not pay. But I do want do push back on the idea that the commons is the problem or that closing it is the right policy solution. – At the Summit, there were important stories about the need for traditional communities to safeguard their own data and AI tools to maintain local stewardship over their contents. But it not clear to me that many companies are actively looking to mine TK, especially given that most TK is not openly published on the Internet or in journals or other sources where the miners scrape most frequently. – It is somewhat unfortunate, and perhaps a product of the moment, that the recognition of the creative value of generative AI is last among the preambular findings. For the CC community, which has “Creative” in its name, this principle might be moved up. How important is gen AI for creators? I know small filmmakers in South Africa, for example, who are using AI tools to radically improve their production quality at low cost, enabling them to compete with larger studios with bigger budgets. We need more work on this. |
Beginning We have formulated the following seven principles to regulate generative AI models to protect the interests of creators, people who build on the commons (including through AI), and society’s interests in the sustainability of common goods: 1. It is important that people continue to have the ability to study and analyze existing works to create new ones. The law should continue to leave room for people to do so, including through the use of machines, while addressing societal concerns raised by the rise of generative AI. 2. All parties should work together to define ways for creators and rights holders to express their preferences regarding AI training for their copyrighted works. In the context of an enforceable right, the ability to “opt out” such uses should be considered the legislative limit, as approaches based on voluntary acceptance and consent would block large swathes of the commons due to the duration and the excessive scope of copyright protection, as well as the fact that most works are not being actively managed. 3. In addition, all parties should also work together to address implications for other rights and interests (e.g. data protection, use of a person’s image or identity). This would likely involve interventions through frameworks other than copyright. 4. Particular attention should be paid to using traditional knowledge materials to train AI systems, including ways for community custodians to provide or revoke authorization. 5. Any legal regime must ensure that the use of copyrighted works is permitted to train generative AI systems for non-commercial purposes in the public interest, including scientific research and education. 6. Ensure that generative AI results in widely shared economic prosperity: The benefits that developers of AI models derive from access to the commons and copyrighted works must be widely shared among those who contribute to the commons. 7. To counter the current concentration of resources in the hands of a small number of companies, these measures must be accompanied by public investment in public computing infrastructures that meet the needs of public interest users of this technology on a global scale. Additionally, there is also a need to publicly invest in training data sets that respect the principles described above and are managed as commons. | I note that the statement was not presented as a complete reflection of the CC community. It was crafted through a process that sought to include as many voices as possible through individual consultations, including with me, and through inputs given at many AI panels and workshops. 1. It is good to see the first principle recognizing the value in rights to research — “that people continue to have the ability to study and analyze existing works to create new ones.” 2. I am not sure endorsing an opt out should be a CC policy beyond its current enablement of creators to prevent unlicensed commercial uses. This statement appears broader — enabling “preferences regarding AI training for their copyrighted works” across the board. Does this principle justify creating new CC tools to block uses that CC licenses (e.g. CC-By) would otherwise allow? In the context of the unsettled law noted in t he preamble, the permitting of opt outs through CC licenses may threaten the principle that “CC licenses do not reduce, limit, or restrict any rights under exceptions and limitations to copyright.” 3. It is true that non-copyright regulations may be needed in various areas affected by generative AI. But perhaps this is a realm somewhat outside of CC’s expertise. A key consideration should be making sure that such regulations do not reinforce incumbent positions. 4. Surely all knowledge governance systems should respect, protect and promote the rights of TK stewards to manage access and use of TK, including but not limited to in text and data mining or gen AI research. 5. It is good to see a principle protecting scientific research purposes, but we lack a good model of treating “scientific” and “commercial” research differently. The EU applies the most permissive text and data mining rules to users in certain institutions. This approach disadvantages smaller or less officially established organizations and firms that may not be able to compete in licensing markets with the largest players. Distinctions based on the character of the materials mined (e.g. less regulation of uses of scientific publications, see The Access Principle) are also likely to be radically underinclusive. Developing a good principle or set of values in this area may be useful, but also very difficult. An alternative, championed by User Rights member Matt Sag, is to focus on the nature of the use (“non-expressive”) rather than the purpose of the user or nature of the materials used. 6. The principle that Generative AI should result in widely shared prosperity is a good Rawlsian principle of justice that regulations should pursue. The principle that benefits to developers of AI models “must be widely shared among those who contribute to the commons” seems like a new principle that, outside of share-alike licenses, is not core to previous CC instruments. 7. The last principle reaffirms the role of information commons in combating the concentration of resources in the hands of a small number of companies that can dominate licensing markets. This is a useful principle that should form a key basis for a CC statement in this area. The statement promotes public investment in open data sets, which is likely to be beneficial. But the work of CC , including on this statement, should likely focus on the impact of its central and most successful tool — which is to create information commons through open licenses, including for commercial interests. The idea has been that small creators need information commons. My assumption is that this principle holds for AI creators too. |