To briefly recap, a group of authors sued the AI company Anthropic for pirating their books off the internet through illegal downloads and incorporating it into their AI data training sets, alleging piracy, copyright violation and theft. Which it clearly was. In an interesting twist, Anthropic then went out and bought quite literally tons of books, cut the spines off of them, scanned the pages, then trashed the then-scanned books, claiming the rights of first-ownership that they could do what they wanted to with the books.
But that was a bit of ex post facto reasoning: they'd already committed the crime of stealing the contents of the books, subsequently buying them after having already incorporated the contents into the datasets doesn't make it all better.
From the article:
"In June, U.S. District Judge William Alsup ruled that Anthropic’s use of the books in training models was “exceedingly transformative,” one of the factors courts have used in determining whether the use of protected works without authorization was a legal “fair use.” His decision was the first major decision that weighed the fair use question in generative AI systems.
Yet Alsup also ruled that Anthropic had to face a trial on the question of whether it is liable for downloading millions of pirated books in digital form off the internet, something it had to do in order to train its models for its AI service Claude. The books were obtained from datasets Library Genesis and Pirate Library Mirror.
“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages,” the judge wrote. (emphasis mine)
The piracy issue was a huge one. in court, Anthropic IT staff testified that they used bit torrent software to download vast troves of books at the direction of management. The problem is with bit torrent. Bit torrent uses "seeds". When you download a file, you are downloading small pieces of it from many clients and servers from around the world. And your computer becomes one such piece of this network and starts serving up pieces of the files that you've downloaded to people requesting those files.
As a general rule, companies don't go after people downloading pirated material if they're not downloading it 24/7/365. But they do go after people providing pirated material! And if you use bit torrent software to download pirated material, you're downloading AND uploading material that shouldn't be shared! Eventually they're going to notice you and their attorneys are going to dust off their giant mallets of loving correction.
I've used bit torrent software before. But what I use it for is downloading books that I've bought from Humble Bundle where I've got 20 large PDF books to download, it's the only practical way to do it even when I have a fairly fast fiberoptic internet connection. And I leave my torrent connection open so other people who've bought the bundle can benefit from my PC having those books on it.
I have no idea how many books Anthropic downloaded. It's quite possible that Anthropic has no absolute count as to how many books they downloaded. And that's probably why they agreed to this settlement. They wanted to avoid a damages trial which would dig into exactly how many books they had stolen.
And let's take that one step further. This would have branded them - in court! - as the world's largest piracy case. EVER. That's one thing that they definitely did not want to be branded with. A great big Scarlet P that they would wear forever. Much better to pay $1.5 billion and be rid of it.
Two additional things about this of interest. First, the settlement only covers their misdeeds through August 25. If they are found to have conducted any additional piracy after this date, then all the court processes could get reset and everything starts over again. Second, and this is the most significant part:
"Anthropic also has agreed to destroy the datasets used in its models."I have no idea what this fully means. Since they bought all these books and scanned them, they presumably have an even better dataset on standby once this pirated set is destroyed, so it shouldn't affect them much. Perhaps this is purely a symbolic victory, but it is an important one. We shall see.
https://deadline.com/2025/09/anthropic-ai-lawsuit-settlement-1-5-billion-1236509423/https://yro.slashdot.org/story/25/09/05/1941245/anthropic-agrees-to-pay-record-15-billion-to-settle-authors-ai-lawsuit