Those same images have made it easier for AI systems to produce realistic and explicit imagery of fake children as well as transform social media photos of fully clothed real teens into nudes, much to the alarm of schools and law enforcement around the world.

Until recently, anti-abuse researchers thought the only way that some unchecked AI tools produced abusive imagery of children was by essentially combining what they’ve learned from two separate buckets of online images — adult pornography and benign photos of kids.

But the Stanford Internet Observatory found more than 3,200 images of suspected child sexual abuse in the giant AI database LAION, an index of online images and captions that’s been used to train leading AI image-makers such as Stable Diffusion. The watchdog group based at Stanford University worked with the Canadian Centre for Child Protection and other anti-abuse charities to identify the illegal material and report the original photo links to law enforcement.

  • Snot Flickerman
    link
    fedilink
    English
    arrow-up
    17
    ·
    edit-2
    11 months ago

    “Actually checking all the images we scraped the internet for is too hard, and the CSAM algorithms aren’t available to just anyone to check to make sure they don’t have child porn waaaaah”

    It’s all because it’s a “make money first and fuck any guardrails” ethos. It’s the same shit they hide behind when saying it’s not piracy when LLMs are trained on books3, which is well known to be the entirety of a private tracker for ebooks which specializes in removing DRM and distributes the tools to remove DRM. (Specifically, Bibliotik.)

    Literally, books3 was always pirated, and not just pirated, but easily provable to be a large DMCA violation of having broken encryption to remove DRM from the books. So how is any media produced from a pirated dataset not technically a copyright violation themselves? Especially when the company in question is getting oodles of money for it? The admins of the Pirate Bay went to prison for less.

    You can’t tell me that a source for media that is KNOWN to be sourced pirated material somehow becomes A-Okay for a private company to use for profit. That’s just bullshit. But I’ve seen plenty of defense of it. Apparently it’s okay for companies to commit instances or piracy, as long as they make money or something? Makes no fucking sense to me.

    • gregorum@lemm.ee
      link
      fedilink
      English
      arrow-up
      14
      ·
      edit-2
      11 months ago

      “That’s pirated content!”

      “But we’re an AI company who used it to train our LLM and profited greatly from it!”

      • Snot Flickerman
        link
        fedilink
        English
        arrow-up
        12
        ·
        11 months ago

        But if you pirated it because you just liked Metallica and wanted to listen to their Black Album and made no money from it, well, Lars Ulrich is coming to sue your ass, babay!