UK launches review of AI model training on copyrighted content

On December 9, OpenAI made the Sora model of its artificial intelligence video generation publicly available in the United States and other countries.

Cfoto | Future publication | Getty Images

The UK is drawing up measures to regulate the use of copyrighted content by tech companies to train their artificial intelligence models.

The British government on Tuesday launched a consultation which aims to increase clarity for both the creative industries and AI developers in relation to how intellectual property is obtained and then used. by AI firms for training purposes.

Some artists and publishers are unhappy with the way their content is being freely scraped by companies like OpenAI and Google to train their big language models — AI models trained on large quantities of data to generate human responses.

Large language models are the underlying technology behind today's generative AI systems, including OpenAI's ChatGPT, Google's Gemini and Anthropic's Claude.

Last year, The New York Times filed a lawsuit against Microsoft and OpenAI has accused companies of infringing its copyright and abusing intellectual property to train large language models.

In response, OpenAI disputed the NYT's allegations, stating that the use of open web data for training AI models should be considered "fair use" and that it provides an "opt-out" yet -rights holders "because it's the right thing to do."

Separately, image distribution platform Getty Images sued another generative AI firm, Stability AI, in the UK, accusing it of scraping millions of images from its websites without consent to train the AI model of its Stable Diffusion. AI Stability disputed the suit, noting that the training and development of its model took place outside the UK

Proposals to be considered

First, the consultation will consider making an exception to copyright law for AI training when used in the context of commercial purposes but while still allowing rights holders to reserve the their rights so that they can control the use of their content.

Second, the consultation will put forward proposed measures to help creators license and be remunerated for the use of their content by AI model makers, as well as give AI developers clarity on what material can be used for training their models.

The government said more work needs to be done by both the creative industries and technology firms to ensure that any standards and requirements for rights reservation and transparency are effective, accessible and widely adopted. '.

The government is also considering proposals that would require AI model makers to be more transparent about their model training datasets and how they are obtained so that rights holders can understand when and how their content was used to train the AI.

This can be controversial - tech firms are not particularly forthcoming when it comes to the data that powers their desired algorithms or how to train them, given the commercial sensitivities involved in disclosing those secrets to potential competitors. .

Previously, under former Prime Minister Rishi Sunak, the government tried to agree a voluntary code of practice of AI copyright.