Any human work of art is made by an artist who is synthesizing other’s past work they’ve absorbed. If I write a murder-mystery crime novel set in 1930s England, it would be hard to avoid some influence of Agatha Christie’s. Does it mean I’m stealing from her?
There’s a separate issue of AI taking jobs, which is very real, but will probably need something like Basic Income to deal with on a society-wide level.
@BaylorSwift3 this isn’t about its output. This is about its input.
LLM companies are basically capturing and copying these entire books to use as training materials. In every other sphere, that would ostensibly financially benefit authors and publishers.
For example in my experience if you want students to read a few chapters of a text (up to 3) that’s “fair use” but if you want them to read the whole thing then either they buy the textbook or your institution buys a digital license.
The point of the lawsuit is that having an AI does not legally allow companies to engage in what looks like pirating of training materials. It will be interesting to see the verdict.
Tagging @ogeist
In your example, since only one LLM is being trained, like one student, does OpenAI only need to buy one digital copy of the book at retail?
Even still, if I want to copy Agatha Christie’s style, I can borrow a book from the library, train myself with it, then return it at no cost to myself. A hundred other people can do the same with that book, and the only cost burden was on that library’s initial purchase. Does copyright protect the right to make copies, or does it dictate something else?
Does copyright protect the right to make copies
Among other things, yes. I think this is what this particilar lawsuit is about.
It will be really interesting to see whether they define an LLM as singular or plural/corporate. Those files (with hundreds of texts) seem to have been doing the rounds so it doesn’t look like a single use to me. But I can also see the merit in your one AI = one student argument.
Re: your Agatha Christie example, not sure how that works in the US but in my country (New Zealand) if a book is in a library, then the author or publisher gets a certain yearly compensation payment based on how many copies are in how many libraries.
E-book licensing similarly has different costs based on how many “copies” ir simultaneous sessions a library is authorised to have.
Not only that, the screenshot in the complaint of writing in the style of an Author is not defendable IMO. If I, a real person, write in JK Rowling style , I’m in no copyright danger.
Now if I ask the ChatGPT to write pages 1 to 13 and I get the content. Then that is another story.
@ogeist the point of the screenshot is simply to prove that the AI has clearly been fed the entire work.
This lawsuit isn’t about output.
Of course, this is about the grey area of AI. But how is the copyright infringed? Yes, the AI was fed the book, how is that different if I read the book and began writing in his style, because i have read the book, with my own different story? As in the example given.
What if the AI was only fed 50% of the books?
@ogeist The copyright infringement (if any) will be at the point of copying and distributing the book.
For example if I went out and photocopied this guy’s entire novel and stapled it together and gave it to you, that’s technically copyright infringement.
Has nothing to do with what you or I write ourselves subsequently except if what we write proves that I must have copied the book and illegally distributed it then that could be evidence.
Plagiarism is not the only kind of copyright infringement.
I mean, if you want to pump out nonstop copyright free works that aren’t original, be my guest. If there was no human that made it, there is no copyright law that will defend it.
James Patterson can’t join this because he created an LLM 25 years ago which is how he churns out a new book every 6 months
I think OpenAI was deeply irresponsible, and potentially even understood that they were crawling pirated works of art. I welcome innovation guided by people with ethics. We would be better off with a pressure to make more work public, rather than letting capitalists use piracy in order to develop a better product.
I don’t think you are going to be able to stop AI churning out material, be it pop songs, newspaper articles, books, art etc etc
However I think there might be a new market for “human generated art” that sells at a premium. We already have “artisan” products that are similar (or worse!) quality, but sell at a premium because humans made them.
A useful reminder that a semi-active Redditor before 2016 probably has more copyrighted content that’s been fed into these LLMs than the majority of authors in the suit.
Ultimately these are built using your data, not a handful of authors.
They are a drop in the ocean.
You (collectively) are the rest of the ocean.
Maybe you get the money, maybe you don’t.
But this ship has sailed.