• rumba@lemmy.zip
    link
    fedilink
    English
    arrow-up
    62
    ·
    2 days ago

    Okay, I can work with this. Hey Altman you can train on anything that’s public domain, now go take those fuck ton of billions and fight the copyright laws to make public domain make sense again.

      • rumba@lemmy.zip
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        Counter counterpoint: I don’t know, I think making an exception for tech companies probably gives a minor advantage to consumers at least.

        You can still go to copilot and ask it for some pretty fucking off the wall python and bash, it’ll save you a good 20 minutes of writing something and it’ll already be documented and generally best practice.

        Sure the tech companies are the one walking away with billions of dollars and it presumably hurts the content creators and copyright holders.

        The problem is, feeding AI is not significantly different than feeding Google back in the day. You remember back when you could see cached versions of web pages. And hell their book scanning initiative to this day is super fucking useful.

        If you look at how we teach and train artists. And then how those artists do their work. All digital art and most painting these days has reference art all over the place. AI is taking random noise and slowly making things look more like the reference art that’s not wholly different than what people are doing.

        We’re training AI on every book that people can get their hands on, But that’s how we train people too.

        I say that training an AI is not that different than training people, and the entire content of all the copyright they look at in their lives doesn’t get a chunk of the money when they write a book or paint something that looks like the style of Van Gogh. They’re even allowed to generate content for private companies or for sale.

        What is different, is that the AI is very good at this and has machine levels of retention and abilities. And companies are poised to get rich off of the computational work. So I’m actually perfectly down with AI’s being trained on copyrighted materials as long as they can’t recite it directly and in whole, But I feel the models that are created using these techniques should also be in the public domain.

        • melpomenesclevage@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          8
          ·
          edit-2
          1 day ago

          giving an exception to tech companies gives an advantage to consumers

          No. shut the fuck up. these companies are anti human and only exist to threaten labor and run out the clock on climate change so we all die without a revolution and the billionaires flee to the bunkers they’re convinced will save them (they won’t, closed systems are doomed). it’s an existential threat. this is so obvious, I’m agreeing with fucking yudkowsky, of all fucking people-he is correct, if for entirely wrong nonsense reasons.

          good for writing code

          so, I have tried to use it for that. nothing I have ever asked it for was remotely fit for purpose, often referring to things like libraries that straight up do not exist. it might be fine if it can quote a long thing from stack exchange from a program anyone who’s been coding for a decade has ten versions of laying around in their home folder, but if you want a piece of code that does something particular, it’s worse than useless. not even as a guide.

          AI

          HOLY SHIT WE HAVE AI NOW!? WHEN DID THIS HAPPEN!? can I talk to it? or do you just mean large language models?

          there’s some benefit in these things regurgitating art

          tell me you don’t understand a single thing about how these models work, and don’t understand a single thing about the value meaning or utility of art, without saying “I don’t understand a single thing about how these models work, and don’t understand a single thing about the value meaning or utility of art.”.

    • meathappening@lemmy.ml
      link
      fedilink
      English
      arrow-up
      7
      ·
      2 days ago

      This is the correct answer. Never forget that US copyright law originally allowed for a 14 year (renewable for 14 more years) term. Now copyright holders are able to:

      • reach consumers more quickly and easily using the internet
      • market on more fronts (merch didn’t exist in 1710)
      • form other business types to better hold/manage IP

      So much in the modern world exists to enable copyright holders, but terms are longer than ever. It’s insane.

  • fartsparkles@lemmy.world
    link
    fedilink
    arrow-up
    156
    ·
    2 days ago

    If this passes, piracy websites can rebrand as AI training material websites and we can all run a crappy model locally to train on pirated material.

  • A_norny_mousse@feddit.org
    link
    fedilink
    arrow-up
    61
    ·
    edit-2
    2 days ago

    Fuck Sam Altmann, the fartsniffer who convinced himself & a few other dumb people that his company really has the leverage to make such demands.

    “Oh, but democracy!” - saying that in the US of 2025 is a whole 'nother kind of dumb.
    Anyhow, you don’t give a single fuck about democracy, you’re just scared because a chinese company offers what you offer for a fraction of the price/resources.

    Your scared for your government money and basically begging for one more handout “to save democracy”.

    Yes, I’ve been listening to Ed Zitron.

    • supersquirrel@sopuli.xyz
      link
      fedilink
      arrow-up
      9
      ·
      2 days ago

      gosh Ed Zitron is such an anodyne voice to hear, I felt like I was losing my mind until I listened to some of his stuff

      • dylanmorgan@slrpnk.net
        link
        fedilink
        arrow-up
        8
        ·
        2 days ago

        Yeah, he has the ability to articulate what I was already thinking about LLMs and bring in hard data to back up his thesis that it’s all bullshit. Dangerous and expensive bullshit, but bullshit nonetheless.

        It’s really sad that his willingness to say the tech industry is full of shit is such an unusual attribute in the tech journalism world.

        • supersquirrel@sopuli.xyz
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          2 days ago

          It’s really sad that his willingness to say the tech industry is full of shit is such an unusual attribute in the tech journalism world.

          What is interesting is if he didn’t pretty regularly say in so many words " why the fuck AM I the guy who is sounding the alarm here?? " I would be much more skeptical of his points. He isn’t someone that is directly aligned with the industry, at least not in an “authoritative expert capable of doing a thorough takedown of a bubble/hype mirage” sense that you would expect someone sounding the alarm on a bubble to be. I mean I can tell the guy likes the attention (not in a bad sense really), but he seems utterly genuine in the attitude of " wtf, well ok I will do it… but like seriously I AM the guy who is sounding the alarm here? This isn’t honestly my direct area of expertise? I will provide you a thorough explantion with proof… but my argument really isn’t complicated, it is just ‘business doesn’t make money why will no one acknowledge that’ and it breaks my brain that people that are experts in directly adjacent/relevant things can’t see this…? am I high? "

          … cus yeah Ed Zitron, that is how a lot of us fucking feel right now.

          (these aren’t direct quotes, I was summarizing, go watch/listen to some of Ed Zitron’s stuff, none of his arguments hinge on anything unreasonable or especially complicated, which is the worrying part…)

      • Pennomi@lemmy.world
        link
        fedilink
        English
        arrow-up
        29
        ·
        2 days ago

        It’s only theft if they support laws preventing their competitors from doing it too. Which is kind of what OpenAI did, and now they’re walking that idea back because they’re losing again.

      • masterspace@lemmy.ca
        link
        fedilink
        English
        arrow-up
        25
        ·
        edit-2
        2 days ago

        No it’s not.

        It can be problematic behaviour, you can make it illegal if you want, but at a fundamental level, making a copy of something is not the same thing as stealing something.

        • pyre@lemmy.world
          link
          fedilink
          arrow-up
          13
          ·
          edit-2
          2 days ago

          it uses the result of your labor without compensation. it’s not theft of the copyrighted material. it’s theft of the payment.

          it’s different from piracy in that piracy doesn’t equate to lost sales. someone who pirates a song or game probably does so because they wouldn’t buy it otherwise. either they can’t afford or they don’t find it worth doing so. so if they couldn’t pirate it, they still wouldn’t buy it.

          but this is a company using labor without paying you, something that they otherwise definitely have to do. he literally says it would be over if they couldn’t get this data. they just don’t want to pay for it.

          • masterspace@lemmy.ca
            link
            fedilink
            English
            arrow-up
            8
            ·
            edit-2
            2 days ago

            That information is published freely online.

            Do companies have to avoid hiring people who read and were influenced by copyrighted material?

            I can regurgitate copyrighted works as well, and when someone hires me, places like Stackoverflow get fewer views to the pages that I’ve already read and trained on.

            Are companies committing theft by letting me read the internet to develop my intelligence? Are they committing theft when they hire me so they don’t have to do as much research themselves? Are they committing theft when they hire thousands of engineers who have read and trained on copyrighted material to build up internal knowledge bases?

            What’s actually happening, is that the debates around AI are exposing a deeply and fundamentally flawed copyright system. It should not be based on scarcity and restriction but rewarding use. Information has always been able to flow freely, the mistake was linking payment to restricting it’s movement.

            • pyre@lemmy.world
              link
              fedilink
              arrow-up
              6
              ·
              2 days ago

              it’s ok if you don’t know how copyright works. also maybe look into plagiarism. there’s a difference between relaying information you’ve learned and stealing work.

              • Grimy@lemmy.world
                link
                fedilink
                arrow-up
                7
                ·
                2 days ago

                Training on publicly available material is currently legal. It is how your search engine was built and it is considered fair use mostly due to its transformative nature. Google went to court about it and won.

                • pyre@lemmy.world
                  link
                  fedilink
                  arrow-up
                  3
                  ·
                  2 days ago

                  can you point to the trial they won? I only know about a case that was dismissed.

                  because what we’ve seen from ai so far is hardly transformative.

      • Refurbished Refurbisher@lemmy.sdf.org
        link
        fedilink
        arrow-up
        2
        ·
        2 days ago

        Only if it’s illegal to begin with. We need to abolish copyright, as with the internet and digital media in general, the concept has become outdated as scarcity isn’t really a thing anymore. This also applies to anything that can be digitized.

        The original creator can still sell their work and people can still choose to buy it, and people will if it is convenient enough. If it is inconvenient or too expensive, people will pirate it instead, regardless of the law.

        • kibiz0r@midwest.social
          link
          fedilink
          English
          arrow-up
          17
          ·
          2 days ago

          Also true. It’s scraping.

          In the words of Cory Doctorow:

          Web-scraping is good, actually.

          Scraping against the wishes of the scraped is good, actually.

          Scraping when the scrapee suffers as a result of your scraping is good, actually.

          Scraping to train machine-learning models is good, actually.

          Scraping to violate the public’s privacy is bad, actually.

          Scraping to alienate creative workers’ labor is bad, actually.

          We absolutely can have the benefits of scraping without letting AI companies destroy our jobs and our privacy. We just have to stop letting them define the debate.

          • Grumuk@lemmy.ml
            link
            fedilink
            English
            arrow-up
            3
            ·
            2 days ago

            Molly White also wrote about this in the context of open access on the web and people being concerned about how their works are being used.

            “Wait, not like that”: Free and open access in the age of generative AI

            The same thing happened again with the explosion of generative AI companies training models on CC-licensed works, and some were disappointed to see the group take the stance that, not only do CC licenses not prohibit AI training wholesale, AI training should be considered non-infringing by default from a copyright perspective.

          • Grimy@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            2 days ago

            Creators who are justifiably furious over the way their bosses want to use AI are allowing themselves to be tricked by this argument. They’ve been duped into taking up arms against scraping and training, rather than unfair labor practices.

            That’s a great article. Isn’t this kind of exactly what is going on here? Wouldn’t bolstering copyright laws make training unaffordable for everyone except a handful of companies. Then these companies, because of their monopoly, could easily make the highest level models only affordable by the owner class.

            People are mad at AI because it will be used to exploit them instead of the ones who exploit them every chance they get. Even worse, the legislation they shout for will make that exploitation even easier.

  • Zink@programming.dev
    link
    fedilink
    arrow-up
    16
    ·
    2 days ago

    What I’m hearing between the lines here is the origin of a legal “argument.”

    If a person’s mind is allowed to read copyrighted works, remember them, be inspired by them, and describe them to others, then surely a different type of “person’s” different type of “mind” must be allowed to do the same thing!

    After all, corporations are people, right? Especially any worth trillions of dollars! They are more worthy as people than meatbags worth mere billions!

    • chicken@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      4
      ·
      edit-2
      2 days ago

      I don’t think it’s actually such a bad argument because to reject it you basically have to say that style should fall under copyright protections, at least conditionally, which is absurd and has obvious dystopian implications. This isn’t what copyright was meant for. People want AI banned or inhibited for separate reasons and hope the copyright argument is a path to that, but even if successful wouldn’t actually change much except to make the other large corporations that own most copyright stakeholders of AI systems. That’s not actually a better circumstance.

      • tacobellhop@midwest.social
        link
        fedilink
        English
        arrow-up
        4
        ·
        edit-2
        2 days ago

        Actually I would just make the guard rails such that if the input can’t be copyrighted then the ai output can’t be copyrighted either. Making anything it touches public domain would reel in the corporations enthusiasm for its replacing humans.

    • ArtificialHoldings@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      edit-2
      2 days ago

      This has been the legal basis of all AI training sets since they began collecting datasets. The US copyright office heard these arguments in 2023: https://www.copyright.gov/ai/listening-sessions.html

      MR. LEVEY: Hi there. I’m Curt Levey, President of the Committee for Justice. We’re a nonprofit that focuses on a variety of legal and policy issues, including intellectual property, AI, tech policy. There certainly are a number of very interesting questions about AI and copyright. I’d like to focus on one of them, which is the intersection of AI and copyright infringement, which some of the other panelists have already alluded to.

      That issue is at the forefront given recent high-profile lawsuits claiming that generative AI, such as DALL-E 2 or Stable Diffusion, are infringing by training their AI models on a set of copyrighted images, such as those owned by Getty Images, one of the plaintiffs in these suits. And I must admit there’s some tension in what I think about the issue at the heart of these lawsuits. I and the Committee for Justice favor strong protection for creatives because that’s the best way to encourage creativity and innovation.

      But, at the same time, I was an AI scientist long ago in the 1990s before I was an attorney, and I have a lot of experience in how AI, that is, the neural networks at the heart of AI, learn from very large numbers of examples, and at a deep level, it’s analogous to how human creators learn from a lifetime of examples. And we don’t call that infringement when a human does it, so it’s hard for me to conclude that it’s infringement when done by AI.

      Now some might say, why should we analogize to humans? And I would say, for one, we should be intellectually consistent about how we analyze copyright. And number two, I think it’s better to borrow from precedents we know that assumed human authorship than to invent the wheel over again for AI. And, look, neither human nor machine learning depends on retaining specific examples that they learn from.

      So the lawsuits that I’m alluding to argue that infringement springs from temporary copies made during learning. And I think my number one takeaway would be, like it or not, a distinction between man and machine based on temporary storage will ultimately fail maybe not now but in the near future. Not only are there relatively weak legal arguments in terms of temporary copies, the precedent on that, more importantly, temporary storage of training examples is the easiest way to train an AI model, but it’s not fundamentally required and it’s not fundamentally different from what humans do, and I’ll get into that more later if time permits.

      The “temporary storage” idea is pretty central for visual models like Midjourney or DALL-E, whose training sets are full of copyrighted works lol. There is a legal basis for temporary storage too:

      The “Ephemeral Copy” Exception (17 U.S.C. § 112 & § 117)

      U.S. copyright law recognizes temporary, incidental, and transitory copies as necessary for technological processes.
      Section 117 allows temporary copies for software operation.
      Section 112 permits temporary copies for broadcasting and streaming.
      
      • ArtificialHoldings@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        2 days ago

        BTW, if anyone was interested - many visual models use the same training set, collected by a German non-profit: https://laion.ai/

        It’s “technically not copyright infringement” because the set is just a link to an image, paired with a text description of each image. Because they’re just pointing to the image, they don’t really have to respect any copyright.

        • ArtificialHoldings@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          1 day ago

          Copyright law doesn’t cover recipes - it’s just a “trade secret”. But the approximate recipe for coca cola is well known and can be googled.

  • AfricanExpansionist@lemmy.ml
    link
    fedilink
    arrow-up
    15
    ·
    2 days ago

    Obligatory: I’m anti-AI, mostly anti-technology

    That said, I can’t say that I mind LLMs using copyrighted materials that it accesses legally/appropriately (lots of copyrighted content may be freely available to some extent, like news articles or song lyrics)

    I’m open to arguments correcting me. I’d prefer to have another reason to be against this technology, not arguing on the side of frauds like Sam Altman. Here’s my take:

    All content created by humans follows consumption of other content. If I read lots of Vonnegut, I should be able to churn out prose that roughly (or precisely) includes his idiosyncrasies as a writer. We read more than one author; we read dozens or hundreds over our lifetimes. Likewise musicians, film directors, etc etc.

    If an LLM consumes the same copyrighted content and learns how to copy its various characteristics, how is it meaningfully different from me doing it and becoming a successful writer?

    • droplet6585@lemmy.ml
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      2 days ago

      and learns how to copy its various characteristics

      Because you are a human. Not an immortal corporation.

      I am tired of people trying to have iNtElLeCtUaL dIsCuSsIoN about/with entities that would feed you feet first into a wood chipper if it thought it could profit from it.

    • Pennomi@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      2 days ago

      Right. The problem is not the fact it consumes the information, the problem is if the user uses it to violate copyright. It’s just a tool after all.

      Like, I’m capable of violating copyright in infinitely many ways, but I usually don’t.

      • SoulWager@lemmy.ml
        link
        fedilink
        arrow-up
        5
        ·
        edit-2
        2 days ago

        The problem is that the user usually can’t tell if the AI output is infringing someone’s copyright or not unless they’ve seen all the training data.

    • ricecake@sh.itjust.works
      link
      fedilink
      arrow-up
      7
      ·
      2 days ago

      Yup. Violating IP licenses is a great reason to prevent it. According to current law, if they get Alice license for the book they should be able to use it how they want.
      I’m not permitted to pirate a book just because I only intend to read it and then give it back. AI shouldn’t be able to either if people can’t.

      Beyond that, we need to accept that might need to come up with new rules for new technology. There’s a lot of people, notably artists, who object to art they put on their website being used for training. Under current law if you make it publicly available, people can download it and use it on their computer as long as they don’t distribute it. That current law allows something we don’t want doesn’t mean we need to find a way to interpret current law as not allowing it, it just means we need new laws that say “fair use for people is not the same as fair use for AI training”.

    • kibiz0r@midwest.social
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 days ago

      If an LLM consumes the same copyrighted content and learns how to copy its various characteristics, how is it meaningfully different from me doing it and becoming a successful writer?

      That is the trillion-dollar question, isn’t it?

      I’ve got two thoughts to frame the question, but I won’t give an answer.

      1. Laws are just social constructs, to help people get along with each other. They’re not supposed to be grand universal moral frameworks, or coherent/consistent philosophies. They’re always full of contradictions. So… does it even matter if it’s “meaningfully” different or not, if it’s socially useful to treat it as different (or not)?
      2. We’ve seen with digital locks, gig work, algorithmic market manipulation, and playing either side of Section 230 when convenient… that the ethos of big tech is pretty much “define what’s illegal, so I can colonize the precise border of illegality, to a fractal level of granularity”. I’m not super stoked to come with an objective quantitative framework for them to follow, cuz I know they’ll just flow around it like water and continue to find ways to do antisocial shit in ways that technically follow the rules.
    • A_norny_mousse@feddit.org
      link
      fedilink
      arrow-up
      5
      ·
      2 days ago

      Except the reason Altman is so upset has nothing to do with this very valid discussion.

      As I commented elsewhere:

      Fuck Sam Altmann, the fartsniffer who convinced himself & a few other dumb people that his company really has the leverage to make such demands.

      He doesn’t care about democracy, he’s just scared because a chinese company offers what his company offers, but for a fraction of the price/resources.

      He’s scared for his government money and basically begging for one more handout “to save democracy”.

      Yes, I’ve been listening to Ed Zitron.

      • Bassman1805@lemmy.world
        link
        fedilink
        arrow-up
        7
        ·
        2 days ago

        You can sue for anything in the USA. But it is pretty much impossible to successfully sue for “ripping off someone’s style”. Where do you even begin to define a writing style?

        • catloaf@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 days ago

          There are lots of ways to characterize writing style. Go read Finnegans Wake and tell me James Joyce doesn’t have a characteristic style.

      • MrQuallzin@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        2 days ago

        Edited for clarity: If that were the case then Weird AL would be screwed.

        Original: In that case Weird AL would be screwed