Win / KotakuInAction2
KotakuInAction2
Communities Topics Log In Sign Up
Sign In
Hot
All Posts
Settings
All
Profile
Saved
Upvoted
Hidden
Messages

Your Communities

General
AskWin
Funny
Technology
Animals
Sports
Gaming
DIY
Health
Positive
Privacy
News
Changelogs

More Communities

frenworld
OhTwitter
MillionDollarExtreme
NoNewNormal
Ladies
Conspiracies
GreatAwakening
IP2Always
GameDev
ParallelSociety
Privacy Policy
Terms of Service
Content Policy
DEFAULT COMMUNITIES • All General AskWin Funny Technology Animals Sports Gaming DIY Health Positive Privacy
KotakuInAction2 The Official Gamergate Forum
hot new rising top

Sign In or Create an Account

44
Federal judge rules copyrighted books are fair use for AI training (archive.is)
posted 364 days ago by YesMovement 364 days ago by YesMovement +44 / -0
24 comments share
24 comments share save hide report block hide replies
Comments (24)
sorted by:
▲ 27 ▼
– horstshort 27 points 364 days ago +27 / -0

But, the judge ruled, AI companies shouldn’t be pirating the books they’re training on.

The judge has no idea how this shit works does he?

permalink save report block reply
▲ 9 ▼
– DemolitionsPanda 9 points 364 days ago +9 / -0

Do you?

For example, do you think there is a practical difference under law from letting a text to speech application read you an e-book from Amazon, vs training your Large Language Model with an e-book from Amazon?

If there is a difference, in your mind, what do you think that it is?

permalink parent save report block reply
▲ 6 ▼
– AtrociKitty 6 points 364 days ago +6 / -0

If there is a difference, in your mind, what do you think that it is?

The text-to-speech application is a transient means of communicating the book. It's no different from opening the e-book on a monitor to read the words. Meanwhile, the LLM is ingesting and storing the text of the book. It's an illegal copy permanently stored in the model's dataset.

That said, I'd rather this be resolved by fixing the issues with copyright. This ruling is just another example of the two-tiered system, where AI training is fair use, while you giving a copy to a friend is infringement.

permalink parent save report block reply
▲ 10 ▼
– SR388-SAX 10 points 364 days ago +10 / -0

Meanwhile, the LLM is ingesting and storing the text of the book

That's not how it works.

permalink parent save report block reply
▲ 3 ▼
– I_Miss_Imp 3 points 363 days ago +3 / -0

Is it not though? Are you saying after book X is used for training that you couldn’t then prompt the AI to “tell me word for word the exact text of book X”?

permalink parent save report block reply
▲ 1 ▼
– DemolitionsPanda 1 point 362 days ago +1 / -0

No, the book isn't copied or stored. The LLM can't regurgitate it on command, because it isn't inside the model.

You can ask the LLM to write new, never before seen text in the style of that author.

Training a LLM is a lot more like reading a book to a toddler than it is like making a digital copy. Neither the toddler or the LLM can repeat the words of the book.

permalink parent save report block reply
▲ 1 ▼
– AtrociKitty 1 point 362 days ago +1 / -0

In terms of copyright, yes it is. It doesn't matter that the book isn't literally copy-pasted into a vector database. The text is used verbatim as training data, and from there isn't made into a sufficiently transformative work to constitute fair use (plus it's commercial). Training data, even if it can neither be recalled on demand nor exists in whole form, has still been stored within the model's semantic memory.

permalink parent save report block reply
▲ 5 ▼
– Gizortnik 5 points 364 days ago +5 / -0

You purchased a text to speech application, and then had to legally gain access to said book. The book is property which has restricted access that you have to either purchase or pay for the service of gaining temporary access to the book. The purpose of your use of the text to speech is for the private consumption of the book's material.

Using a LLM to read a book that you didn't purchase access to, and turning it into the basis for the training of your program is still going to be theft of the material. If you're going to purchase it for training material, you can probably do that with the explicit recognition that the work and text of the book can't be re-distributed through your algorithm. You might be able to get away with specific citations for quotes, but the work itself can't be redistributed as profit for the book would have been diminished by your algorithm's actions. If the LLM's work that comes from the book is transformative, then you're fine.

What would not be fine is for books to be protected by copyright and paywall, not purchased, and still be data mined for their material without compensation to the owner. This is especially true with something as vast as a for-profit business as an AI algorithm.

permalink parent save report block reply
▲ 2 ▼
– DemolitionsPanda 2 points 362 days ago +2 / -0

Oh, Gizortnik. Here we go.

It isn't theft. Theft, by definition ,requires that the criminal deprive someone of something. You can't steal a digital product.

For example, if you steal my car, I can't drive my car. That is theft. Downloading episodes of Magnum PI isn't theft.

It isn't copyright infringement, because that (by definition) covers the uncompensated reproduction and sale of an intellectual property. Training a LLM with an e-book doesn't copy or sell the book.

I think that you need to go and read some definitions, because you are groping in the dark for reasons to object and using words you don't actually understand.

If you want to advocate for new laws that govern how people can create and run computer programs on hardware they own, then advocate for that. You will certainly have a lot of company.

permalink parent save report block reply
▲ 1 ▼
– Gizortnik 1 point 345 days ago +1 / -0

You can't steal a digital product.

That's never been true. If you want to make a philosophical argument that digital property shouldn't be considered property, fine. But the law is clear that digital assets can be stolen.

permalink parent save report block reply
▲ 1 ▼
– DemolitionsPanda 1 point 344 days ago +1 / -0

Look, guy, it is a matter of definition. Different words mean different things.

If the thief took every copy and the lawful owner could not use the digital product, then that would be theft. If the bad guy just made unauthorized copies or used the digital data (or whatever) without a license, then it isn't theft.

The reason that the distinction matters is because harm must be assessed when rendering legal judgement by the courts. If I download a copy of Magnum PI, I have not depraved anyone of the use or enjoyment of the episode. If I stole an episode with ninjas then no network in the world could use it until it was recovered, probably by men with guns.

These are not the same. The damages are not the same.

Gizortnik, this is well understood law. It isn't even debated in legal circles. It was debated to exhaustion at around the time of printed sheet music, more than 200 years ago.

You really are showing your profound ignorance in this specific issue.

permalink parent save report block reply
▲ 1 ▼
– Gizortnik 1 point 344 days ago +1 / -0

You're trying to cover-up that you just had to make a correction to what your saying, while being dismissive of shit that is absolutely true.

You just admitted that, yes, digital assets can be stolen. Originally you said it wasn't possible. It is, and you admit it. Copying without a license isn't theft, and I know the difference. You conflated them.

If I download a copy of Magnum PI, I have not depraved anyone of the use or enjoyment of the episode. If I stole an episode with ninjas then no network in the world could use it until it was recovered, probably by men with guns.

This is how you are conflating them. How did you get that download? You can steal files from a system. If you record it on your phone and download it, or if someone screen records it from their TV and sends you a copy, neither of those are theft.

Neither of those has anything to do with "deprivation". That's not an element of theft. Just because you stole a car from someone's drive way while they were asleep, and returned it before they woke up, it doesn't mean that it doesn't count as theft. The illegal and consensual use of the thing is normally enough for theft.

permalink parent save report block reply
▲ 1 ▼
– DemolitionsPanda 1 point 343 days ago +1 / -0

Bloke, I gave a definition and a case study in my original post. I've stuck to it consistently.

If you return a car, then you borrowed it, rather than stole it. Even if the judge rules that you did steal it (it was unusable by the owner while you had it) then the damages are significantly different. The car was stolen, but the damage has been lessened (remedied) by the return of the property. This will be reflected in sentencing. This is why there is a huge range of sentences available to a judge. All damages are not equal.

Here is the major difference between digital products and physical products; and why it is much more difficult to steal a digital product.

There is no scarcity of digital products. Everyone on earth can have a copy, and it doesn't deprive anyone else of anything.

COPYING something isn't the same as STEALING something. I'm sorry you are having trouble with this.

Seriously, guy. Perhaps my explanation was imperfect, but this is literally the first year Law on torts.

To determine what offence was committed, an assessment of the damages must be made. In what way or ways was the offended party damaged or harmed? Specifically.

Depriving you of your work truck so you can't get transport to work and you can't do your job is different from making photocopies of your novel.

I don't know how to make it any more clear. I apologize that I don't have the ability to reach you with reason and convey basic, industry standard definitions that have been in use for more than a century.

permalink parent save report block reply
... continue reading thread?
▲ 2 ▼
– horstshort 2 points 364 days ago +2 / -0

My comment was more about the fact that companies 'shouldn't pirate' than how the technical side of LLM training works.

permalink parent save report block reply
▲ 1 ▼
– Piroko 1 point 362 days ago +1 / -0

If you read into the judge's ruling...

The judge chews out the plaintiffs pretty ruthlessly for setting the bar of their argument so low that he basically had to side with Meta.

"This ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." -Judge Chhabria

He then went on to spell out exactly what argument the authors SHOULD have made, and that other, smarter plaintiffs should consider themselves fortunate that THIS batch of plaintiffs didn't get a class action certification.

permalink parent save report block reply
▲ 10 ▼
– undecidedmask2 10 points 364 days ago +10 / -0

Idiot. Don't see how anyone could say this, but the judge is probably old and doesn't have any idea what AI does.

permalink save report block reply
▲ 18 ▼
– Vivs3rdSock 18 points 364 days ago +18 / -0

Most people don't know what AI does, hell it's not even actual AI at this point, or even VI yet.

permalink parent save report block reply
▲ 7 ▼
– Agenda47 7 points 364 days ago +7 / -0

What would make it not even VI? (if you mean like the Virtual Intelligence in Mass Effect) I'd argue what we have now is perfectly comparable to that, although not quite as sophisticated or feature-complete yet.

permalink parent save report block reply
▲ 1 ▼
– deleted 1 point 364 days ago +1 / -0
▲ 4 ▼
– stalememes 4 points 364 days ago +4 / -0

Considering modern publishing, i just have to ask:

Are they trying to make their bots retarded?

permalink save report block reply
▲ 1 ▼
– deleted 1 point 364 days ago +1 / -0

Original 8chan Links to Gamer Gate:

.

The main GG discussion is on the videogames board: https://8chan.moe/v/

.

GamerGate archive is at https://8chan.moe/gamergatehq/

.

GamerGate Wiki:

https://ggwiki.deepfreeze.it/index.php/Main_Page

. . . . . .

. . . . . .

Rules:

.

ONE: Do not advocate for illegal violence or post other illegal activity. (Be aware of your local laws.)

.

TWO: Don't threaten, harass, or impersonate users. Also: don't be a psycho. New users will be held to a higher standard.

.

THREE: Do not post porn.

.

FOUR: NSFW/NSFL content must be flaired NSFW.

.

FIVE: No vote manipulation. Do not break communities.win's features.

.

SIX: No spam or reposts. Do not make more than 5 threads a day.

.

SEVEN: Do not post falsehoods and hoaxes that are obvious to an uncontroversial degree.

. . . . . .

. . . . . .

Moderation Logs:

.

(Two different versions, Scored has more features and is cleaner, but .win let's you see a few more details in certain instances.)

  • Scored
  • .win

Moderators

  • DomitiusOfMassilia
  • C
  • BandageBandolier
  • CarmenOfSandiego
  • The_Shadow_of_Intent
  • SocraticMethod1
  • Kienan
  • Smith1980
Message the Moderators

Terms of Service | Privacy Policy

2026.02.01 - bh6wd (status)

Copyright © 2026.

Terms of Service | Privacy Policy