Authors: Rossana Ducato and Alain Strowel

Abstract: Text and Data Mining (TDM) is a vital tool in the Big Data economy. TDM uses techniques from natural language processing, machine learning, information retrieval, and knowledge management for the automated analysis of digital content (structured and unstructured data), in order to extract information, identify patterns, discover new trends, insights or correlations.

The importance of TDM has been understood by the European legislator, which has introduced two specifically tailored exceptions in the Copyright in the Digital Single Market Directive. After a critical analysis of the new provisions, the paper argues that they still present several flaws that risk to stifle AI developments in Europe. Thus, the contribution outlines an interpretative framework, based on the analysis of the infringement test, to rethink the rights of reproduction and extraction in line with the economic rationale of copyright and the database right. Furthermore, the paper makes suggestions to improve the TDM exceptions at national level. In conclusion, it points out the remaining challenges of private ordering and trade secrets for research and AI innovation.

Citation: R. Ducato, A. Strowel, Ensuring Text and Data Mining: Remaining Issues With the EU Copyright Exceptions and Possible Ways Out, CRIDES Working Paper Series no. 1/2021; forthcoming in 43 European Intellectual Property Review, 2021/5, p. 322-337. https://ssrn.com/abstract=3829858