In the recent LAION vs. Kneschke case, the Hamburg District Court addressed the application of Germany’s text and data mining (TDM) exceptions under the Copyright Directive. This is one of the first rulings to be reached with respect to copyright, AI, and the TDM exception. See recent analysis by  IPKAT and COMMUNIA.

The court ruled that LAION, a non-profit organization that curates public datasets for AI training, did not infringe the copyright of Kneschke, a photographer, by including his image in its dataset. The court decided that LAION’s actions fell under the TDM exception for scientific research, as specified in Article 3 of the Copyright Directive and implemented in German law through Section 60d UrhG.

This ruling has raised controversy regarding the boundaries of TDM exceptions. The court’s interpretation could potentially allow for the unrestricted use of TDM-generated datasets for commercial AI training, blurring the lines between permissible research activities and copyright infringement. The judgment suggested that even though LAION’s dataset is used by commercial entities for AI model training, the downstream usage is irrelevant to the TDM exception’s applicability.
A key point of ruling is that it draws a distinction between the creation of a dataset through TDM and subsequent training of AI models on that dataset. The court added that although creating the dataset is covered by TDM exceptions, the subsequent training on that dataset might not be. This provides a hint as to how the court might act with respect to lawsuits against the companies that are training these models in the future, and sheds light on which companies or parties might bear the legal liability in such instances.

“…by distinguishing between the creation of data sets and the subsequent training of models, the court provides an analytically useful framework for increasing public understanding of AI models… it is in the public interest to allow non-profit scientific research organisations (in whatever form) to build public training datasets, even if those datasets can subsequently be used by for-profit entities.” –COMMUNIA Association

While the LAION ruling provides a framework for understanding the relationship between TDM and public dataset creation, it remains incomplete in that it does not address the potential for copyright infringement which may arise from the commercialization of AI models that are trained on such datasets.