AI image-generators are being trained on explicit photos of children, a study shows

@girlfreddy@lemmy.ca · 7 months ago

AI image-generators are being trained on explicit photos of children, a study shows

Snot Flickerman · edit-2 7 months ago

I don’t think that’s what they’re suggesting at all.

The question isn’t “Did you know there is child porn in your data set?”

The question is “Why the living fuck didn’t you know there was child porn in your fucking data set, you absolute fucking idiot?”

The answer is more mealy-mouthed bullshit from pussies who didn’t have a plan and are probably currently freaking the fuck out about harboring child porn on their hard drives.

The point is it shouldn’t have happened to begin with and they don’t really have a fucking excuse and if all they can come up with is “well that’s not good” maybe they should go die in a fucking fire to make the world a better place. “Oopsie doodles I’m sowwy” isn’t good enough.

@ricecake@sh.itjust.works · 7 months ago

Wow, calm the fuck down dude.

The reason they didn’t know is because the AI groups aren’t the ones scanning the Internet, different projects do that and publish the data, and yet a different project identifies images and extracts alt text from them.

They’re probably freaking out about as much as any search engine is when they discover they indexed CSAM, and probably less because they’re not actually holding the images.

I know the point you’re going for, and raging out at the topic only undermines your point.

Snot Flickerman · 7 months ago

“Other groups organized this data, but we couldn’t be fucked to check to make sure it was all fully legal and above board” said nobody who actually cared about such things ever.

The fact that they don’t check because it would take too long and slow them down compared to competitors is literally the point. It’s all about profit motive over safety or even basic checking of things beforehand.

It’s a really, really weak excuse.

@ricecake@sh.itjust.works · 7 months ago

Did you know that they actually do check? It’s true! There’s a big difference between what happened, which is CSAM was found in the foundation data, and that CSAM then being used for training.

Stability AI on Wednesday said it only hosts filtered versions of Stable Diffusion and that “since taking over the exclusive development of Stable Diffusion, Stability AI has taken proactive steps to mitigate the risk of misuse.” “Those filters remove unsafe content from reaching the models,” the company said in a prepared statement. “By removing that content before it ever reaches the model, we can help to prevent the model from generating unsafe content.”

Also, the people who maintain the foundational dataset do checks, although which was mentioned by the people who reported the issue. Their critique was that the checks had flaws, not that they didn’t exist.

So if your only issue is that they didn’t check, well… You’re wrong.

bedrooms · 7 months ago

400 million images that is. Checking all is impossible.

paraphrand · 7 months ago

But we had to indiscriminately harvest these images from the web. Otherwise we would not have collected enough images in a timely manner!