New York Times Sues OpenAI, Microsoft for Copyright Infringement

The newspaper of record claims that millions of its articles were not authorized to train ChatGPT and the companies’ other A.I. tools

The New York Times Building in Manhattan.
The New York Times building in Manhattan. (Credit: Getty Images)

The New York Times sued Microsoft and OpenAI, the creator of ChatGPT, for copyright infringement on Wednesday, accusing the two companies of “unlawful use of The Times’s work to create artificial intelligence products” that compete with the news outlet’s work and “threaten” its ability to provide independent journalism.

The suit claims that the generative A.I. tools that Microsoft and OpenAI have created rely on large language models, or LLM, “that were built by copying and using millions of The Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides and more.”

“While defendants engaged in widescale copying from many sources, they gave Times content particular emphasis when building their LLMs — revealing a preference that recognizes the value of those works,” the suit, which was filed in U.S. District Court for the Southern District of New York, said. “Through Microsoft’s Bing Chat (recently rebranded as “Copilot”) and OpenAI’s ChatGPT, defendants seek to free-ride on The Times’ massive investment in its journalism by using it to build substitutive products without permission or payment.”

An OpenAI spokesperson told TheWrap that the company was “surprised” by the Times’ decision to file suit.

“We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models,” the spokesperson said. “Our ongoing conversations with the New York Times have been productive and moving forward constructively, so we are surprised and disappointed with this development. We’re hopeful that we will find a mutually beneficial way to work together, as we are doing with many other publishers.”

While it’s not the first suit brought against A.I. companies over their use of published material — a wide range of writers and other creators from comedian Sarah Silverman to Christian author and former Arkansas Gov. Mike Huckabee have filed suits claiming copyright violations — the Times is the first major U.S. news organization to go after the A.I. platforms.

Uniquely, the Times is arguing that generative A.I. products “threaten high quality journalism.”

“If The Times and other news organizations cannot produce and protect their
independent journalism, there will be a vacuum that no computer or artificial intelligence can fill,” the suit states.

In its own report on the suit, The Times said that it could also test “the emerging legal contours of generative A.I. technologies” and “could carry major implications for the news industry.”

The suit, which comes four months after the Times instituted a ban on using its content to train A.I. systems, maintains that the companies have “refused to recognize” the strong copyright protection provided by U.S. law.

“Powered by LLMs containing copies of Times content, Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it and mimics its expressive style, as demonstrated by scores of examples,” including multiple pieces included as exhibits in the suit.

“These tools also wrongly attribute false information to The Times,” the suit states.

In addition, Microsoft’s Bing search “copies and categorizes The Times’s online content, to generate responses that contain verbatim excerpts and detailed
summaries of Times articles that are significantly longer and more detailed than those returned by traditional search engines,” the suit claims.

“By providing Times content without The Times’s permission or authorization, Defendants’ tools undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising and affiliate revenue,” it continues. The suit says the Times had about 10.1 million subscribers as of Sept. 30 and aims to have 15 million by year-end 2027.

The suit does not include a monetary demand, but notes that using the Times’ work in this way without paying “has been extremely lucrative” for the companies. It claims Microsoft’s use of Times-trained LLMs across its product line “helped boost its market capitalization by a trillion dollars in the past year alone.” And it notes that “OpenAI’s release of ChatGPT has driven its valuation to as high as $90 billion.”

The Times reported that it attempted to reach a negotiated agreement with Microsoft and OpenAI since April that might involve a commercial agreement and possible “technological guardrails,” in much the way it has struck deals with Google, Meta and Apple, but the talks reached an “impasse.”

“Publicly, defendants insist that their conduct is protected as ‘fair use’ because their unlicensed use of copyrighted content to train GenAI models serves a new ‘transformative’ purpose,” the suit maintains. “But there is nothing ‘transformative’ about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.”

“Because the outputs of Defendants’ GenAI models compete with and
closely mimic the inputs used to train them, copying Times works for that purpose is not fair use,” the suit says.

Representatives from Microsoft and OpenAI did not immediately respond to requests for comment, and the companies did not yet have a chance to respond in court.

Comments