The wording of the title is a bit weird, which makes me notice how legal cases are usually worded like “weaker party succeeds/fails to change the status quo”. The artists lost against the companies in this case?
Anyways, important bits here:
Orrick spends the rest of his ruling explaining why he found the artists’ complaint defective, which includes various issues, but the big one being that two of the artists — McKernan and Ortiz, did not actually file copyrights on their art with the U.S. Copyright Office.
Also, Anderson copyrighted only 16 of the hundreds of works cited in the artists’ complaint. The artists had asserted that some of their images were included in the Large-scale Artificial Intelligence Open Network (LAION) open source database of billions of images created by computer scientist/machine learning (ML) researcher Christoph Schuhmann and collaborators, which all three AI art generator programs used to train.
And then
Even if that clarity is provided and even if plaintiffs narrow their allegations to limit them to Output Images that draw upon Training Images based upon copyrighted images, I am not convinced that copyright claims based a derivative theory can survive absent ‘substantial similarity’ type allegations. The cases plaintiffs rely on appear to recognize that the alleged infringer’s derivative work must still bear some similarity to the original work or contain the protected elements of the original work.
Which eh, I’m not sure I agree with. This is a new aspect of technology that isn’t properly covered by existing copyright laws. Our current laws were developed to address a state of the world that no longer exists, and using those old definitions (which I think covered issues around parodies and derivative work) doesn’t make sense in this case.
This isn’t some individual artist drawing something similar to someone else. This is an AI that can take in all work in existence and produce new content from that without providing any compensation. This judge seems to be saying that’s an ok thing to do
did not actually file copyrights on their art with the U.S. Copyright Office.
The way they’ve worded this isn’t really a sufficient explanation of how this works. An artist is automatically granted copyright upon the creation of a work, so it’s not that they don’t have the right to protect their work. It’s just that, without registration, you cannot file a lawsuit to protect your work.
Copyright exists from the moment the work is created. You will have to register, however, if you wish to bring a lawsuit for infringement of a U.S. work.
https://www.copyright.gov/help/faq/faq-general.html
However, if it’s within 5 years of initial publication, they can still be granted a formal registered copyright and bring the complaint again.
Judges don’t make laws, they interpret them. If the current laws don’t cover said new technology, it’s up to the govt to pass new laws.
Good to hear that people won’t be able to weaponize the legal system into holding back progress
EDIT, tl;dr from below: Advocate for open models, not copyright. It’s the wrong tool for this job
AI keeps getting cited as the next big thing that will shape the world. I think this is an appropriate time to use the legal system to make sure those changes happen in a way that won’t screw everything up.
The progress will happen whether we like it or not, taking a moment to clarify rules is a good thing
The rules I’ve seen proposed would kill off innovation, and allow other countries to leapfrog whatever countries tried to implement them.
What rules do you think should be put in place?
If any commercial use of AI generated art required some transfer of money from the company using it to the artists whose work was included in training the models, it’d probably be a step in the right direction.
Why?
Well, why shouldn’t they have to pay artists a license to use their work, especially in ways that could drastically affect the market?
There is a thing called copyright, and the exception to that rule is called fair use.
Artwork is copyrighted by default and, under the law, to use someone else’s copyrighted works requires a license (that is usually bought). Whether AI training counts as fair use is the question and ultimately that is the point that will need to be proved/justified.
So again, what makes AI “fair use” and why shouldn’t companies have to pay a license for their use of copyrighted artwork?
If you look at a hundred paintings of faces and then make your own painting of a face, you’re not expected to pay all the artists that you used to get an understanding of what a face looks like.
Even if AI companies were to pay the artists and had billions of dollars to do it, each individual artist would receive a tiny amount, because these datasets are so large.
Much more realistically, they would just retrain their models using data they can use for free.
Btw, I don’t think this is a fair use question, it’s really a question of whether the generated images are derivatives of the training data.
Even if AI companies were to pay the artists and had billions of dollars to do it, each individual artist would receive a tiny amount, because these datasets are so large.
I don’t really think that’s a problem. If a company is generating $X.00 in revenue using AI generated work, some percentage of that revenue should probably be going to the artists whose work was used in training that model, even if it’s a fraction of a fraction of a cent per image generated.
If you look at a hundred paintings of faces and then make your own painting of a face, you’re not expected to pay all the artists that you used to get an understanding of what a face looks like.
That’s because I’m a human being. I’m acting on my own volition and while I’ve observed artwork, I’ve also had decades of life experience observing faces in reality. Also importantly, my ability to produce artwork (and thus my potential to impact the market) is limited and I’m not owned or beholden to any company.
“AI” “art” is different in every way. It is being fed a massive dataset of copyrighted artwork, and has no experiences or observations of its own. It is property, not a fee or independent being. And also, it can churn out a massive amount of content based on its data in no time at all, posing a significant challenge to markets and the livelihood of human creative workers.
All of these are factors in determining whether it’s fair to use someone else’s copyrighted material, which is why it’s fine for a human being to listen to a song and play it from memory, but it’s not fine for a tape recorder to do the same (bootlegging).
Btw, I don’t think this is a fair use question, it’s really a question of whether the generated images are derivatives of the training data.
I’m not sure what you mean by this. Whether something is derivative or not is one of the key questions used to determine whether the free use of someone else’s copyrighted work is fair, as in fair use.
AI training is using people’s copyrighted work, and doing so almost exclusively without knowledge, consent, license or permission, and so that’s absolutely a question of fair use. They either need to pay for the rights to use people’s copyright work OR they need to prove that their use of that work is “fair” under existing laws. (Or we need to change/update/overhaul the copyright system.)
Even if AI companies were to pay the artists and had billions of dollars to do it, each individual artist would receive a tiny amount, because these datasets are so large.
The amount that artists would be paid would be determined by negotiation between the artist (the rights holder) and the entity using their work. AI companies certainly don’t get to unilaterally decided what people’s art licenses are worth, and different artists would be worth different amounts in the end. There would end up being some kind of licensing contract, which artists would have to agree to.
Take Spotify for example, artists don’t get paid a lot per stream and it’s arguably not the best deal, but they (or their label) are still agreeing to the terms because they believe it’s worth it to be on those platforms. That’s not a question of fair use, because there is an explicit licensing agreement being made by both parties. The biggest artists like Taylor Swift negotiate better deals because they make or break the platform.
So back to AI, if all that sounds prohibitively expensive, legally fraught, and generally unsustainable, then that’s because it probably is–another huge tech VC bubble just waiting to burst.
Why should they? Copyright is an artificial restriction in the first place, that exists “To promote the Progress of Science and useful Arts” (in the US, but that’s where most companies are based). Why should we allow further legal restrictions that might strangle the progress of science and the useful arts?
What many people here want is for AI to help as many people as possible instead of just making some rich fucks richer. If we try to jam copyright into this, the rich fucks will just use it to build a moat and keep out the competition. What you should be advocating for instead is something like a mandatory GPL-style license, where anybody who uses the model or contributed training data to it has the right to a copy of it that they can run themselves. That would ensure that generative AI is democratizing. It also works for many different issues, such as biased models keeping minorities in jail longer.
tl;dr: Advocate for open models, not copyright
Copyright is an artificial restriction
All laws are artificial restrictions, and copyright law is not exactly some brand new thing.
AI either has to work within the existing framework of copyright law OR the laws have to be drastically overhauled. There’s no having it both ways.
What you should be advocating for instead is something like a mandatory GPL-style license, where anybody who uses the model or contributed training data to it has the right to a copy of it that they can run themselves.
I’m a programmer and I actually spend most of my week writing GPLv3 code.
Any experienced programmer knows that GPL code is still subject to copyright. People (or their employer in some cases) own the code the right, and so they have the intellectual right to license that code under GPL or any other license that happens to be compatible with their code base. In other words I have the right to license my code under GPL, but I do not have the right to apply GPL to someone else’s code. Look at the top of just about any source code file and you’ll find various copyright statements for each individual code author, which are separate from the terms of their open source licensing.
I’m also an artist and musician and, under the current laws as they exist today, I own the copyright to any artwork or music that I happen to create by default. If someone wants to use my artwork or music they can either (a) get a license from me, which will likely involve some kind of payment, or (b) successfully argue that the way they are using my work is considered a “fair use” of copyrighted material. Otherwise I can publish my artwork under a permissive license like public domain or creative commons, and AI companies can use that as they please, because it’s baked into the license.
Long story short, whether it’s code or artwork, the person who makes the work (or otherwise pays for the work to be made on the basis of a contract) owns the rights to that work. They can choose to license that work permissively (GPL, MIT, CC, public domain, etc.) if they want, but they still hold the copyright. If Entity X wants to use that copyrighted work, they either have to have a valid license or be operating in a way that can be defended as “fair use”.
tl;dr: Advocate for open models, not copyright
TLDR: Copyright and open source/data are not at odds with each other. FOSS code is still copyrighted code, and GPL is a relatively restrictive and strict license, which in some cases is good and in other cases not depending on how you look at it. This is not what I’m advocating, but the current copyright framework that everything in the modern world is based on.
If you believe that abolishing copyright entirely to usher in a totally AI-driven future is the best path forward for humanity, then you’re entitled to think that.
But personally I’ll continue to advocate for technology which empowers people and culture, and not the other way around.
Because the training, and therefore the datasets are an important part of the work with AI. A lot of ppl are arguing that therefore, the ppl who provided the data (e.g. artists) should get a cut of the revenue or a static fee or something similar for compensation. Because looking at a picture is deemed fine in our society, but copying it and using it for something else is seen more critically.
Btw. I am totally with you regarding the need to not hinder progress, but at the end of the day, we need to think about both the future prospects and the morality.
There was something about labels being forced to pay a cut of the revenue to all bigger artists for every CD they’d sell. I can’t remember what it was exactly, but something like that could be of use here as well maybe.
Let’s be clear. The ai does not in any way “copy” the picture it is trained on.
Yes.
And let’s also pin down that this is the exact issue we need more laws on. What makes an image copyrightable? When can a copyright get violated? And more specifically: whatever the AI model encompasses, can that inhibit fully copyrighted material? Can a copyrighted image be assumed by noting down all of its features?
This is the exact corner that we are fighting over currently.
Because LLM needs human-produced material to work with. If the incentive to produce such material drops, generative models will start producing garbage.
It has already started to be a problem with the current LLMs that have exhausted most easily reached sources of content on the internet and are now feeding off LLM-generated content, which has resulted in a sharp drop in quality.
“It has already started to be a problem with the current LLMs that have exhausted most easily reached sources of content on the internet and are now feeding off LLM-generated content, which has resulted in a sharp drop in quality.”
Do you have any sources to back that claim? LLMs are rising in quality, not dropping, afaik.
It’s still being researched but there are papers that show that, mathematically, generative models cannot feed on their own output. If you see an increase in quality it’s usually because their developers have added a new trove of human-generated data.
In simple terms, these models need two things to be able to generate useful output: they need external guidance about which input is good and which is bad (throughout the process), and they need both types of input to reach a certain critical mass.
Since the reliability of these models is never 100%, with every input-output cycle the quality drops.
If the model input is very well curated and restricted to known good sources it can continue to improve (and by improve I mean asymptotically approach a value which is never 100% but high enough, like over 90%). But if models are allowed to feed on generative output (being thrown back at them by social bots and website generators) their quality will take a dive.
I want to point out that this is not an AI issue. Humans don’t have a 100% correct output either, and we have the exact same problem – feeding on our own online garbage. For us the trouble started showing much slower, over the last couple of decades or so, as talk about “fake news”, misinformation being weaponized etc.
AI merely accelerated the process, it hit the limits of reliability much sooner. We will need to solve this issue either way, and we would have needed to solve it even if AI weren’t a thing. In a way the appearance of AI helped us because it forces us to deal with the issue of information reliability sooner rather than later.
I think this is an appropriate time to use the legal system to make sure those changes happen in a way that won’t screw everything up.
Tell me which rules would definitely do that without screwing it up worse, for this obscenely complicated technology that’s only meaningfully existed for about a year. I could use a laugh.
I tend to agree with the judge’s assessment. He must make a decision based on existing law and the plaintiff’s claim/argument. You’re right existing law doesn’t cover this aspect of technology which is why there needs to be new laws enacted by Congress. And the courts are put in a no win situation here because we’ve failed to establish new rules and regulations for this new technology.
The plaintiff’s claim of derivative work doesn’t fit here because of what has already been long established what a derivative work looks like. AI generated images aren’t really derivative works.
I think rightfully, the court has told them to try again, which is ok.
This is why it is bad that this is happening in the US.
You don’t have the concept of the living tree doctrine in your body of law, or if you do, it’s not particularly well developed. It’s all about the writers intent down there.
Writers intent is sometimes enforced and sometimes not. Ammendments 4-8 are all about criminal rights so it’s very clear that the founders were very concerned about people being accused/convicted of crimes, yet today you can’t be searched without a warrant unless the cop doesn’t like you can can come up with a lie saying he’s sure you were doing something illegal.
Ehhh. Originalism is mostly a lie that conservatives tell when making up what they want a law to mean.
Yes, but it took until an old white British guy codified in the early 1930s for the living tree doctrine to be a thing.
And it was hard-coded in the Canadian Charter of Rights and Freedoms by Pierre Trudeau. It’s the primary reason why the Canadian fight for marriage equality was so open and shut compared to what the US is still going through.
Generating arbitrary new images is extremely transformative, and reducing a zillion images to a few bytes each is pretty minimal. It is really fucking difficult to believe “draw Abbey Road as a Beeple piece” would get a commissioned human artist bankrupted, if they openly referenced that artist’s entire catalog, but didn’t exactly reproduce any portion of it.
For language models, it’s even sillier. ‘The network learned English by reading books!’ Uh. Yeah. As opposed to what? If it’s in the library, anyone can read it. That’s what it’s for.