Well, why shouldn’t they have to pay artists a license to use their work, especially in ways that could drastically affect the market?
There is a thing called copyright, and the exception to that rule is called fair use.
Artwork is copyrighted by default and, under the law, to use someone else’s copyrighted works requires a license (that is usually bought). Whether AI training counts as fair use is the question and ultimately that is the point that will need to be proved/justified.
So again, what makes AI “fair use” and why shouldn’t companies have to pay a license for their use of copyrighted artwork?
If you look at a hundred paintings of faces and then make your own painting of a face, you’re not expected to pay all the artists that you used to get an understanding of what a face looks like.
Even if AI companies were to pay the artists and had billions of dollars to do it, each individual artist would receive a tiny amount, because these datasets are so large.
Much more realistically, they would just retrain their models using data they can use for free.
Btw, I don’t think this is a fair use question, it’s really a question of whether the generated images are derivatives of the training data.
If you look at a hundred paintings of faces and then make your own painting of a face, you’re not expected to pay all the artists that you used to get an understanding of what a face looks like.
That’s because I’m a human being. I’m acting on my own volition and while I’ve observed artwork, I’ve also had decades of life experience observing faces in reality. Also importantly, my ability to produce artwork (and thus my potential to impact the market) is limited and I’m not owned or beholden to any company.
“AI” “art” is different in every way. It is being fed a massive dataset of copyrighted artwork, and has no experiences or observations of its own. It is property, not a fee or independent being. And also, it can churn out a massive amount of content based on its data in no time at all, posing a significant challenge to markets and the livelihood of human creative workers.
All of these are factors in determining whether it’s fair to use someone else’s copyrighted material, which is why it’s fine for a human being to listen to a song and play it from memory, but it’s not fine for a tape recorder to do the same (bootlegging).
Btw, I don’t think this is a fair use question, it’s really a question of whether the generated images are derivatives of the training data.
I’m not sure what you mean by this. Whether something is derivative or not is one of the key questions used to determine whether the free use of someone else’s copyrighted work is fair, as in fair use.
AI training is using people’s copyrighted work, and doing so almost exclusively without knowledge, consent, license or permission, and so that’s absolutely a question of fair use. They either need to pay for the rights to use people’s copyright work OR they need to prove that their use of that work is “fair” under existing laws. (Or we need to change/update/overhaul the copyright system.)
Even if AI companies were to pay the artists and had billions of dollars to do it, each individual artist would receive a tiny amount, because these datasets are so large.
The amount that artists would be paid would be determined by negotiation between the artist (the rights holder) and the entity using their work. AI companies certainly don’t get to unilaterally decided what people’s art licenses are worth, and different artists would be worth different amounts in the end. There would end up being some kind of licensing contract, which artists would have to agree to.
Take Spotify for example, artists don’t get paid a lot per stream and it’s arguably not the best deal, but they (or their label) are still agreeing to the terms because they believe it’s worth it to be on those platforms. That’s not a question of fair use, because there is an explicit licensing agreement being made by both parties. The biggest artists like Taylor Swift negotiate better deals because they make or break the platform.
So back to AI, if all that sounds prohibitively expensive, legally fraught, and generally unsustainable, then that’s because it probably is–another huge tech VC bubble just waiting to burst.
Whether something is derivative or not is one of the key questions used to determine whether the free use of someone else’s copyrighted work is fair, as in fair use.
I think training an AI model is not fair use. It’s either derivative work and needs a license or it’s not derivative work and can be used without a license. In both cases it’s not fair use (in the legal sense of “fair use”).
I’m not sure if you’re making an argument about what the law currently says or what it should say. In my opinion the law should be updated to clarify if you need a license to use copyrighted material as training data.
The amount that artists would be paid would be determined by negotiation between the artist (the rights holder) and the entity using their work
Sure, my point is such an agreement will never be made. It’s a good deal for AI companies to use the data for free, but if they can’t do that, they will not be interested.
Either way, I think there is no way for artists to win this. It’s completely possible to train large image generators without copyrighted material. These datasets are so large that paying artists per image will never be feasible.
I mean ML is just the tool someone used to study a work in a way that generated a new understanding (the model). You could theoretically accomplish the same thing with a mathematician measuring painting with a variety of instruments and building a mathical model themselves to represent the relationship between word descriptions and the images produced.
The work of thousands of devs, engineers and scientists have just made that process both more wildly available and applicable.
Even if AI companies were to pay the artists and had billions of dollars to do it, each individual artist would receive a tiny amount, because these datasets are so large.
I don’t really think that’s a problem. If a company is generating $X.00 in revenue using AI generated work, some percentage of that revenue should probably be going to the artists whose work was used in training that model, even if it’s a fraction of a fraction of a cent per image generated.
But then you need to factor in that the rights holders would need to agree to that. AI companies don’t get to simply decide what peoples work is worth, they need a licensing agreement. (Otherwise they need to successfully argue that what they’re doing is fair use.)
And when you add it up and realize that “AI” is a black box based off a training dataset of thousands (if not millions) of pieces of copyrighted artwork, all the sudden you start to see the profit margins on your magical art machine (POOF!) disappear. Oh, won’t someone think about the tech venture capitalists?!
Why should they? Copyright is an artificial restriction in the first place, that exists “To promote the Progress of Science and useful Arts” (in the US, but that’s where most companies are based). Why should we allow further legal restrictions that might strangle the progress of science and the useful arts?
What many people here want is for AI to help as many people as possible instead of just making some rich fucks richer. If we try to jam copyright into this, the rich fucks will just use it to build a moat and keep out the competition. What you should be advocating for instead is something like a mandatory GPL-style license, where anybody who uses the model or contributed training data to it has the right to a copy of it that they can run themselves. That would ensure that generative AI is democratizing. It also works for many different issues, such as biased models keeping minorities in jail longer.
All laws are artificial restrictions, and copyright law is not exactly some brand new thing.
AI either has to work within the existing framework of copyright law OR the laws have to be drastically overhauled. There’s no having it both ways.
What you should be advocating for instead is something like a mandatory GPL-style license, where anybody who uses the model or contributed training data to it has the right to a copy of it that they can run themselves.
I’m a programmer and I actually spend most of my week writing GPLv3 code.
Any experienced programmer knows that GPL code is still subject to copyright. People (or their employer in some cases) own the code the right, and so they have the intellectual right to license that code under GPL or any other license that happens to be compatible with their code base. In other words I have the right to license my code under GPL, but I do not have the right to apply GPL to someone else’s code. Look at the top of just about any source code file and you’ll find various copyright statements for each individual code author, which are separate from the terms of their open source licensing.
I’m also an artist and musician and, under the current laws as they exist today, I own the copyright to any artwork or music that I happen to create by default. If someone wants to use my artwork or music they can either (a) get a license from me, which will likely involve some kind of payment, or (b) successfully argue that the way they are using my work is considered a “fair use” of copyrighted material. Otherwise I can publish my artwork under a permissive license like public domain or creative commons, and AI companies can use that as they please, because it’s baked into the license.
Long story short, whether it’s code or artwork, the person who makes the work (or otherwise pays for the work to be made on the basis of a contract) owns the rights to that work. They can choose to license that work permissively (GPL, MIT, CC, public domain, etc.) if they want, but they still hold the copyright. If Entity X wants to use that copyrighted work, they either have to have a valid license or be operating in a way that can be defended as “fair use”.
tl;dr: Advocate for open models, not copyright
TLDR: Copyright and open source/data are not at odds with each other. FOSS code is still copyrighted code, and GPL is a relatively restrictive and strict license, which in some cases is good and in other cases not depending on how you look at it. This is not what I’m advocating, but the current copyright framework that everything in the modern world is based on.
If you believe that abolishing copyright entirely to usher in a totally AI-driven future is the best path forward for humanity, then you’re entitled to think that.
But personally I’ll continue to advocate for technology which empowers people and culture, and not the other way around.
But personally I’ll continue to advocate for technology which empowers people and culture, and not the other way around.
You won’t achieve this goal by aiding the gatekeepers. Stop helping them by trying to misapply copyright.
Any experienced programmer knows that GPL code is still subject to copyright […]
GPL is a clever hack of a bad system. It would be better if copyright didn’t exist, and I say that as someone that writes AGPL code.
I think you misunderstood what I meant. We should drop copyright, and pass a new law where if you use a model, or contribute to one, or a model is used against you, that model must be made available to you. Similar in spirit to the GPL, but not a reliant on an outdated system.
This would catch so many more use cases than trying to cram copyright where it doesn’t apply. No more:
Handful of already-rich companies building an AI moat that keeps newcomers out
Credit agencies assigning you a black box score that affects your entire life
Minorities being denied bail because of a black box model
Being put on a no-fly list with no way to know that you’re on it or why
Facebook experimenting on you to see if they can make you sad without your knowledge
Well, why shouldn’t they have to pay artists a license to use their work, especially in ways that could drastically affect the market?
There is a thing called copyright, and the exception to that rule is called fair use.
Artwork is copyrighted by default and, under the law, to use someone else’s copyrighted works requires a license (that is usually bought). Whether AI training counts as fair use is the question and ultimately that is the point that will need to be proved/justified.
So again, what makes AI “fair use” and why shouldn’t companies have to pay a license for their use of copyrighted artwork?
If you look at a hundred paintings of faces and then make your own painting of a face, you’re not expected to pay all the artists that you used to get an understanding of what a face looks like.
Even if AI companies were to pay the artists and had billions of dollars to do it, each individual artist would receive a tiny amount, because these datasets are so large.
Much more realistically, they would just retrain their models using data they can use for free.
Btw, I don’t think this is a fair use question, it’s really a question of whether the generated images are derivatives of the training data.
That’s because I’m a human being. I’m acting on my own volition and while I’ve observed artwork, I’ve also had decades of life experience observing faces in reality. Also importantly, my ability to produce artwork (and thus my potential to impact the market) is limited and I’m not owned or beholden to any company.
“AI” “art” is different in every way. It is being fed a massive dataset of copyrighted artwork, and has no experiences or observations of its own. It is property, not a fee or independent being. And also, it can churn out a massive amount of content based on its data in no time at all, posing a significant challenge to markets and the livelihood of human creative workers.
All of these are factors in determining whether it’s fair to use someone else’s copyrighted material, which is why it’s fine for a human being to listen to a song and play it from memory, but it’s not fine for a tape recorder to do the same (bootlegging).
I’m not sure what you mean by this. Whether something is derivative or not is one of the key questions used to determine whether the free use of someone else’s copyrighted work is fair, as in fair use.
AI training is using people’s copyrighted work, and doing so almost exclusively without knowledge, consent, license or permission, and so that’s absolutely a question of fair use. They either need to pay for the rights to use people’s copyright work OR they need to prove that their use of that work is “fair” under existing laws. (Or we need to change/update/overhaul the copyright system.)
The amount that artists would be paid would be determined by negotiation between the artist (the rights holder) and the entity using their work. AI companies certainly don’t get to unilaterally decided what people’s art licenses are worth, and different artists would be worth different amounts in the end. There would end up being some kind of licensing contract, which artists would have to agree to.
Take Spotify for example, artists don’t get paid a lot per stream and it’s arguably not the best deal, but they (or their label) are still agreeing to the terms because they believe it’s worth it to be on those platforms. That’s not a question of fair use, because there is an explicit licensing agreement being made by both parties. The biggest artists like Taylor Swift negotiate better deals because they make or break the platform.
So back to AI, if all that sounds prohibitively expensive, legally fraught, and generally unsustainable, then that’s because it probably is–another huge tech VC bubble just waiting to burst.
I think training an AI model is not fair use. It’s either derivative work and needs a license or it’s not derivative work and can be used without a license. In both cases it’s not fair use (in the legal sense of “fair use”).
I’m not sure if you’re making an argument about what the law currently says or what it should say. In my opinion the law should be updated to clarify if you need a license to use copyrighted material as training data.
Sure, my point is such an agreement will never be made. It’s a good deal for AI companies to use the data for free, but if they can’t do that, they will not be interested.
Either way, I think there is no way for artists to win this. It’s completely possible to train large image generators without copyrighted material. These datasets are so large that paying artists per image will never be feasible.
I mean ML is just the tool someone used to study a work in a way that generated a new understanding (the model). You could theoretically accomplish the same thing with a mathematician measuring painting with a variety of instruments and building a mathical model themselves to represent the relationship between word descriptions and the images produced.
The work of thousands of devs, engineers and scientists have just made that process both more wildly available and applicable.
I don’t really think that’s a problem. If a company is generating $X.00 in revenue using AI generated work, some percentage of that revenue should probably be going to the artists whose work was used in training that model, even if it’s a fraction of a fraction of a cent per image generated.
The problem is that it’s a fraction of a fraction of a cent per image used during training, over the lifetime of the model.
But then you need to factor in that the rights holders would need to agree to that. AI companies don’t get to simply decide what peoples work is worth, they need a licensing agreement. (Otherwise they need to successfully argue that what they’re doing is fair use.)
And when you add it up and realize that “AI” is a black box based off a training dataset of thousands (if not millions) of pieces of copyrighted artwork, all the sudden you start to see the profit margins on your magical art machine (POOF!) disappear. Oh, won’t someone think about the tech venture capitalists?!
Why should they? Copyright is an artificial restriction in the first place, that exists “To promote the Progress of Science and useful Arts” (in the US, but that’s where most companies are based). Why should we allow further legal restrictions that might strangle the progress of science and the useful arts?
What many people here want is for AI to help as many people as possible instead of just making some rich fucks richer. If we try to jam copyright into this, the rich fucks will just use it to build a moat and keep out the competition. What you should be advocating for instead is something like a mandatory GPL-style license, where anybody who uses the model or contributed training data to it has the right to a copy of it that they can run themselves. That would ensure that generative AI is democratizing. It also works for many different issues, such as biased models keeping minorities in jail longer.
tl;dr: Advocate for open models, not copyright
All laws are artificial restrictions, and copyright law is not exactly some brand new thing.
AI either has to work within the existing framework of copyright law OR the laws have to be drastically overhauled. There’s no having it both ways.
I’m a programmer and I actually spend most of my week writing GPLv3 code.
Any experienced programmer knows that GPL code is still subject to copyright. People (or their employer in some cases) own the code the right, and so they have the intellectual right to license that code under GPL or any other license that happens to be compatible with their code base. In other words I have the right to license my code under GPL, but I do not have the right to apply GPL to someone else’s code. Look at the top of just about any source code file and you’ll find various copyright statements for each individual code author, which are separate from the terms of their open source licensing.
I’m also an artist and musician and, under the current laws as they exist today, I own the copyright to any artwork or music that I happen to create by default. If someone wants to use my artwork or music they can either (a) get a license from me, which will likely involve some kind of payment, or (b) successfully argue that the way they are using my work is considered a “fair use” of copyrighted material. Otherwise I can publish my artwork under a permissive license like public domain or creative commons, and AI companies can use that as they please, because it’s baked into the license.
Long story short, whether it’s code or artwork, the person who makes the work (or otherwise pays for the work to be made on the basis of a contract) owns the rights to that work. They can choose to license that work permissively (GPL, MIT, CC, public domain, etc.) if they want, but they still hold the copyright. If Entity X wants to use that copyrighted work, they either have to have a valid license or be operating in a way that can be defended as “fair use”.
TLDR: Copyright and open source/data are not at odds with each other. FOSS code is still copyrighted code, and GPL is a relatively restrictive and strict license, which in some cases is good and in other cases not depending on how you look at it. This is not what I’m advocating, but the current copyright framework that everything in the modern world is based on.
If you believe that abolishing copyright entirely to usher in a totally AI-driven future is the best path forward for humanity, then you’re entitled to think that.
But personally I’ll continue to advocate for technology which empowers people and culture, and not the other way around.
You won’t achieve this goal by aiding the gatekeepers. Stop helping them by trying to misapply copyright.
GPL is a clever hack of a bad system. It would be better if copyright didn’t exist, and I say that as someone that writes AGPL code.
I think you misunderstood what I meant. We should drop copyright, and pass a new law where if you use a model, or contribute to one, or a model is used against you, that model must be made available to you. Similar in spirit to the GPL, but not a reliant on an outdated system.
This would catch so many more use cases than trying to cram copyright where it doesn’t apply. No more: