In its submission to the Australian government’s review of the regulatory framework around AI, Google said that copyright law should be altered to allow for generative AI systems to scrape the internet.

  • P1r4nha@feddit.de
    link
    fedilink
    arrow-up
    2
    ·
    1 year ago

    Practically you would have to separate model architecture from weights. Weights are licensed as research use only, while the architecture is the actual scientific contribution. Maybe some instructions on best train the model.

    Only problem is that you can’t really prove if someone just retrained research weights or trained from scratch using randomized weights. Also certain alterations to the architecture are possible, so only the “headless” models are used.

    I think there’s some research into detecting retraining, but I can imagine it’s not fool proof.

    • frog 🐸@beehaw.org
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      I kind of think that as proof-of-concepts, the AI models are kind of interesting. I don’t like the content they produce much, because it is just so utterly same-y, so I haven’t yet seen anything that made me go “wow, that’s amazing”. But the actual architecture behind them is pretty cool.

      But at this point, they’ve gone beyond researching an interesting idea into full on commercial enterprises. If we don’t have an effective means of retraining the existing models to remove the data that isn’t licenced for commercial use (which is most of it), then it seems the only ethical way to move forward would be to start again with more selective training data, including only what is commercially licenced. Now the research has been done in how to create these models, it should be quicker to build new ones with more ethically sourced training data.