A bipartisan group of senators introduced a new bill to make it easier to authenticate and detect artificial intelligence-generated content and protect journalists and artists from having their work gobbled up by AI models without their permission.
The Content Origin Protection and Integrity from Edited and Deepfaked Media Act (COPIED Act) would direct the National Institute of Standards and Technology (NIST) to create standards and guidelines that help prove the origin of content and detect synthetic content, like through watermarking. It also directs the agency to create security measures to prevent tampering and requires AI tools for creative or journalistic content to let users attach information about their origin and prohibit that information from being removed. Under the bill, such content also could not be used to train AI models.
Content owners, including broadcasters, artists, and newspapers, could sue companies they believe used their materials without permission or tampered with authentication markers. State attorneys general and the Federal Trade Commission could also enforce the bill, which its backers say prohibits anyone from “removing, disabling, or tampering with content provenance information” outside of an exception for some security research purposes.
(A copy of the bill is in he article, here is the important part imo:
Prohibits the use of “covered content” (digital representations of copyrighted works) with content provenance to either train an AI- /algorithm-based system or create synthetic content without the express, informed consent and adherence to the terms of use of such content, including compensation)
I respectfully disagree. I think small time AI (read: pretty much all the custom models on hugging face) will get a giant boost out of this, since they can get away with training on “custom” data sets - since they are too small to be held accountable.
However, those models will become worthless to enterprise level models, since they wouldn’t be able to account for the legality. In other words, once you make big bucks of of AI you’ll have to prove your models were sourced properly. But if you’re just creating a model for small time use, you can get away with a lot.
Removed by mod
I don’t think so either, but to me that is the purpose.
Somewhere between small time personal-use ML and commercial exploitation, there should be ethical sourcing of input data, rather than the current method of “scrape all you can find, fuck copyright” that OpenAI & co are getting away with.
Removed by mod
Why?
Once this passes, OpenAI can’t build ChatGPT on the same (“stolen”) dataset. How does that cement their position?
Taking someone’s creation (without their permission) and turning it into a commercial venture, without giving payment or even attribution is immoral.
If a creator (in the widest meaning of the word) is fine with their works being used as such - great, go ahead. But otherwise you’ll just have to wait before the work becomes public domain (which obviously does not mean publicly available).