The UK’s Data (Use and Access) Bill has now passed, without the amendment that would’ve required AI tools to declare the use of copyrighted material, or any provision for copyright holders to ‘opt-out’ of their work being used as training data. The whole thing has left me wondering if there’ll ever be something that AI can’t gobble up and regurgitate. Well, a legal case in the US against AI firm Anthropic has produced an absolutely perfect punchline to this bleak episode.

A federal judge has ruled that Anthropic didn’t break the law when it used copyrighted material to train the large language model Claude, as this counts as “fair use” under US copyright law, reports AP News. What’s keeping Anthropic submerged in legal hot water, though, is how the company may have acquired that copyrighted material—in this case, thousands of books not bought but ‘found’ online. Long legal story short, AI can scrape copyrighted content—it just can’t pirate it.

For context, this all began last summer, when authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson first brought their lawsuit against Anthropic.

That filing from August last year alleged, “Anthropic downloaded known pirated versions of Plaintiffs’ works.” The full complaint goes on to read, “An essential component of Anthropic’s business model—and its flagship ‘Claude’ family of large language models (or ‘LLMs’)—is the largescale theft of copyrighted works,” and that the company “seeks to profit from strip-mining the human expression and ingenuity behind each one of those works.”

A number of documents disclosed as part of legal proceedings unearthed concerns from Anthropic’s own employees about the use of pirated books to train Claude. Though the company pivoted to buying physical books in bulk and painstakingly digitising each page for the AI model to gobble up, the judge ruled that the earlier piracy still needs to be legally addressed. As such, the ruling made by San Francisco federal court Judge William Alsup on Monday means that Claude can keep being trained on the author’s works—but Anthropic must return to court in December to be tried based on the whole “largescale theft of copyrighted works” thing.

Portland, OR, USA - May 2, 2025: Assorted AI apps, including ChatGPT, Gemini, Claude, Perplexity, Meta AI, Microsoft Copilot, and Grok, are seen on the screen of an iPhone.

(Image credit: hapabapa via Getty Images)

Judge Alsup wrote in this week’s ruling, “Anthropic had no entitlement to use pirated copies for its central library.” I’m no legal professional, but on this point I can agree. However, Alsup also described the output of AI models trained on copyrighted material as “quintessentially transformative,” and therefore not a violation of fair use under the law.

He went on to add, “Like any reader aspiring to be a writer, Anthropic’s (AI large language models) trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different.”

Again, I’m not any kind of lawyer, and I’m definitely not offering legal advice, but yeah, I’m not buying this argument. I’d argue that a truly transformative, creative synthesis requires at least some understanding of whatever material you’re imbibing. Large language models like Claude don’t ‘understand’ texts as we do, instead playing an extremely complex game of word association.

In other words, Claude isn’t creating, it’s just trying to string together enough words that its training data say go together in order to fool a human into thinking the AI output they’re reading is coherent copy. But what do I know? I’m just a writer—and Large Language Models may now enjoy the legal precedent set by this San Francisco case.

By

Leave a Reply

Your email address will not be published. Required fields are marked *