It’s all made from our data, anyway, so it should be ours to use as we want

  • FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    8
    ·
    20 hours ago

    Are you threatening me with a good time?

    First of all, whether these LLMs are “illegally trained” is still a matter before the courts. When an LLM is trained it doesn’t literally copy the training data, so it’s unclear whether copyright is even relevant.

    Secondly, I don’t think that making these models “public domain” would have the negative effects that people angry about AI think it would. When a company is running a closed model internally, like ChatGPT for example, the model is never available for download in the first place. It doesn’t matter if it’s public domain or not because you can’t get a copy of it. When a company releases an open-weight model for public use, on the other hand, they usually encumber them with some sort of license that makes them harder for competitors to monetize or build on. Making those public-domain would greatly increase their utility. It might make future releases less likely, but in the meantime it’ll greatly enhance AI development.

    • sem
      link
      fedilink
      English
      arrow-up
      5
      ·
      19 hours ago

      The LLM does reproduce copyrighted data though.

      • ClamDrinker@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        13 hours ago

        Not 1:1, overfitted images still have considerable differences to their original. If you chose “reproduce” to make that point, that’s why OP clarified it wasn’t literally copying training data, as the actual data being in the model would be a different story. Because these models are (in simplified form) a bunch of really complex math that produces material, it’s a mathematical inevitability that it produces copyrighted material, even for calculations that weren’t created due to overfitting. Just like infinite monkeys on infinite typewriters will eventually reproduce every piece of copyrighted text.

        But then I would point you to the camera on your phone. If you take a copyrighted picture with that, you’re still infringing. But was the camera created with the intention to appropriate material captured by the lens? Which is why we don’t blame the camera for that, we blame the person that used it for that purpose. AI users have an ethical obligation not to steer the AI towards generating infringing material.

        • catloaf@lemm.ee
          link
          fedilink
          English
          arrow-up
          2
          ·
          3 hours ago

          And the easiest way to do that is to not include infringing material in the first place.

      • desktop_user
        link
        fedilink
        English
        arrow-up
        2
        ·
        17 hours ago

        *it can produce data identical to data that has been copyrighted before