Michael Hiltzik: Here's the number that could halt the AI revolution in its tracks

The artificial intelligence camp loves big numbers. The sum raised by OpenAI in its latest funding round: $40 billion. Expected investments on AI by Meta, Amazon, Alphabet and Microsoft this year: $320 billion. Market value of Nvidia Corp., the supplier of chips for AI firms: $4.2 trillion.

Those figures are all taken by AI adherents as validating the promise and potential of the new technology. But here's a figure that points in the opposite direction: $1.05 trillion.

That's how much the AI firm Anthropic could be on the hook for if a jury decides that it willfully pirated 6 million copyrighted books in the course of "training" its AI bot Claude, and if the jury decides to smack it with the maximum statutory damages of $150,000 per work.

That places Anthropic in "a legal fight for its very existence," reckons Edward Lee, an expert in intellectual property law at the Santa Clara University School of Law.

The threat arose July 17, when U.S. District Judge William Alsup certified a copyright infringement lawsuit brought by several published authors against Anthropic as a class action.

I wrote about the case last month. At that time, Alsup had rejected the plaintiffs' copyright infringement claim, finding that Anthropic's use of copyrighted material to develop its AI bot fell within a copyright exemption known as "fair use."

But he also found that Anthropic's downloading of copies of 7 million books from online "shadow libraries," which included countless copyrighted works, without permission, smelled like piracy.

"We will have a trial on the pirated copies ... and the resulting damages," he advised Anthropic, ominously. He put meat on those bones with his subsequent order, designating the class as copyright owners of books Anthropic downloaded from the shadow libraries LibGen and PiLiMi. (Several of my own books wound up in Books3, another such library, but Books3 isn't part of this case and I don't know whether my books are in the other libraries.)

The class certification could significantly streamline the Anthropic litigation. "Instead of millions of separate lawsuits with millions of juries," Alsup wrote in his original ruling, "we will have a single proceeding before a single jury."

The class certification adds another wrinkle — potentially a major one — to the ongoing legal wrangling over the use of published works to "train" AI systems. The process involves feeding enormous quantities of published material — some of it scraped from the web, some of it drawn from digitized libraries that can include copyrighted content as well as material in the public domain.

The goal is to provide AI bots with enough data to enable them to glean patterns of language that they can regurgitate, when asked a question, in a form that seems to be (but isn't really) the output of an intelligent entity.

Authors, musicians and artists have filed numerous lawsuits asserting that this process infringes their copyrights, since in most cases they haven't granted permission or been compensated for it for the use.

One of the most recent such cases, filed last month in New York federal court by authors including Kai Bird — co-author of "American Prometheus," which became the authorized source of the movie "Oppenheimer" — charges that Microsoft downloaded "approximately 200,000 pirated books" via Books3 to train its own AI bot, Megatron.

Like many of the other copyright cases, Bird and his fellow plaintiffs contend that the company could have trained Megatron using works in the public domain or obtained under licensing. "But either of those would have taken longer and cost more money than the option Microsoft chose," the plaintiffs state: to train its bot "without permission and compensation as if the laws protecting copyrighted works did not exist."

I asked Microsoft for a response, but haven't received a reply.

Among judges who have pondered the issues, the tide seems to be building in favor of regarding the training process as fair use. Indeed, Alsup himself came to that conclusion in the Anthropic case, ruling that use of the downloaded material for AI training was fair use — but he also heard evidence that Anthropic had held on to the downloaded material for other purposes — specifically to build a research library of its own. That's not fair use, he found, exposing Anthropic to accusations of copyright piracy.

Alsup's ruling was unusual, but also "Solomonic," Lee told me. His finding of fair use delivered a "partial victory" for Anthropic, but his finding of possible piracy put Anthropic in "a very difficult spot," Lee says. That's because the financial penalties for copyright infringement can be gargantuan, ranging from $750 per work to $150,000 — the latter if a jury finds that the user engaged in willful infringement.

As many as 7 million works may have been downloaded by Anthropic, according to filings in the lawsuit, though an undetermined number of those works may have been duplicated in the two shadow libraries the firm used, and may also have been duplicated among copyrighted works the firm actually paid for. The number of works won't be known until at least Sept. 1, the deadline Alsup has given the plaintiffs to submit a list of all the allegedly infringed works downloaded from the shadow libraries.

If subtracting the duplicates brings the total of individual infringed works to 7 million, a $150,000 bill per work would total $1.05 trillion. That would financially swamp Anthropic: The company's annual revenue is estimated at about $3 billion, and its value on the private market is estimated at about $100 billion.

"In practical terms," Lee wrote on his blog, "ChatGPT is eating the world," class certification means "Anthropic faces at least the potential for business-ending liability."

Anthropic didn't reply to my request for comment on that prospect. In a motion asking Alsup to send his ruling to the 9th U.S. Circuit Court of Appeals or to reconsider his finding himself, however, the company pointed to the blow that his position would deliver to the AI industry.

If his position were widely adopted, Anthropic stated, then "training by any company that downloaded works from third-party websites like LibGen or Books3 could constitute copyright infringement."

That was an implicit admission that the use of shadow libraries is widespread in the AI camp, but also a suggestion that since it's the shadow libraries that committed the alleged piracy, the AI firms that used them shouldn't be punished.

Anthropic also noted in its motion that the plaintiffs in its case didn't raise the piracy issue themselves — Alsup came up with it on his own, by treating the training of AI bots and the creation of a research library as two separate uses, the former allowed under fair use, the latter disallowed as an infringement. That deprived Anthropic of an opportunity to respond to the theory in court.

The firm observed that a fellow federal judge in Alsup's San Francisco courthouse, Vince Chhabria, came to a contradictory conclusion only two days after Alsup, absolving Meta Platforms of a copyright infringement claim on similar facts, based on the fair use exemption.

Alsup's class certification is likely to roil both the plaintiff and defendant camps in the ongoing controversy over AI development. Plaintiffs who haven't made a piracy claim in their lawsuits may by prompted to add it. Defendants will come under greater pressure to forestall lawsuits by scurrying to reach licensing deals with writers, musicians and artists. That will happen especially if another judge accepts Alsup's argument about piracy. "That may well encourage other lawsuits," Lee says.

For Anthropic, the challenge will be "trying to convince a jury that the award of damages should be $750 per work," Lee says. Alsup's ruling makes this case one of the rare lawsuits in which "the plaintiffs have the upper hand," now that they have won class certification. "All these companies will have great pressure to negotiate settlements with plaintiffs; otherwise, they're at the mercy of the jury, and you can't bank on anything in terms of what a jury might do."

ArcaMax

Register for your free account:

Business

ArcaMax

Business News

Michael Hiltzik: Here's the number that could halt the AI revolution in its tracks

More Business News

Auto review: The 2025 Toyota 4Runner may be new, but remains familiar

Auto review: 2025 Ford Maverick Lobo brings a new flavor to the market

Auto review: Second-gen Tesla Model 3 Performance takes big leap forward (and a step back)

LA may not have flying cars, but more food delivery bots are coming

'E! News' is shutting down after more than three decades

87% of Californians are concerned home insurance will rise due to climate change

Comments

Popular Stories

Related Channels

Bob Goldman

By Bob Goldman

Jill On Money

By Jill Schlesinger

Message for Daily Living

By Zig Ziglar

Succeeding in Your Business

By Cliff Ennico

Terry Savage

By Terry Savage

Comics

Subscribe

Share