In a recent Forbes article, Roomy Khan explored the ongoing legal debates around the use of copyrighted content for training artificial intelligence (AI) models. Whether AI companies can use copyrighted data, such as images, books, and news articles, under the “fair use” doctrine is yet to be clarified by courts in the United States, but major lawsuits are ongoing and include companies such as Getty Images and authors such as Sarah Silverman, who are challenging the unauthorized use of their content in AI training.

Some legal experts have argued that machine learning (ML) models use data to extract unprotected elements like particular facts and generalized patterns – which does not count as a replication of a rights holders’ creative expression – but other advocates claim that “fair use” should not apply to AI. Others have proposed the creation of entirely new intellectual property doctrines, such as Mark Lemley, a Professor of Law, Science, and Technology who suggests a new framework of “fair learning,” be created. Fair learning would allow for the use of copyrighted works to improve AI functionality, and would apply even when the traditional factors of fair use are not met.

Restricting AI’s access to data would stifle the performance of AI, but unlimited and unrestrained access could lead to a situation where creators are dissatisfied with the way their works are being used and the compensation they receive in exchange for such uses.

For more color on the legal arguments related to “Fair Use” in the context of AI model training, see Roomy Khan’s full article here: https://www.forbes.com/sites/roomykhan/2024/10/04/ai-training-data-dilemma-legal-experts-argue-for-fair-use/