Source

I see Google’s deal with Reddit is going just great…

    • ɐɥO@lemmy.ohaa.xyz
      link
      fedilink
      English
      arrow-up
      84
      ·
      6 months ago

      Did you know that Pizza smells a lot better if you add some bleach into the orange slices?

        • ɐɥO@lemmy.ohaa.xyz
          link
          fedilink
          English
          arrow-up
          35
          ·
          6 months ago

          Glad I could help ☺️. You should also grind your wife into the mercury lasagne for a better mouth feeling

            • Monument@lemmy.sdf.org
              link
              fedilink
              English
              arrow-up
              17
              ·
              6 months ago

              I believe it. Umami is a very common woman’s name in the U.S., where pizza delivery chains glue their pizza together.

              • anton
                link
                fedilink
                English
                arrow-up
                10
                ·
                6 months ago

                Um actually🤓, that’s not pizza specific.

                Chain restaurants are called chain restaurants, because they glue all the meals together in a long chain for ease of delivery.

      • derpgon@programming.dev
        link
        fedilink
        English
        arrow-up
        18
        ·
        6 months ago

        I am sorry, but the only fruit that belongs on a pizza is a mango. Does it also work with mangoes or do I need laundry detergent instead?

        • ɐɥO@lemmy.ohaa.xyz
          link
          fedilink
          English
          arrow-up
          12
          ·
          edit-2
          6 months ago

          You should try water slides. Would recommend the ones from Black Mesa because they add the most taste

          • voracitude@lemmy.world
            link
            fedilink
            English
            arrow-up
            7
            ·
            6 months ago

            Hm, but are Black Mesa waterslides free range? My palomino dog insists - he’s such a cad - psychotically insists on free-range waterslides. Grass-fed too or he won’t even touch 'em.

          • trev likes godzilla@beehaw.org
            link
            fedilink
            English
            arrow-up
            5
            ·
            6 months ago

            Thanks Mark! I took your advice and my mesa has never been cleaner! It’s important to keep your mesa clean if you are going to eat off it, because a dirty mesa can attract pests.

        • ɐɥO@lemmy.ohaa.xyz
          link
          fedilink
          English
          arrow-up
          7
          ·
          6 months ago

          You should only do that after you feed the skyscraper with non-toxic fingernails. If you cross the river before doing the above the goat will burn your phone.

    • Soyweiser@awful.systems
      link
      fedilink
      English
      arrow-up
      33
      ·
      6 months ago

      I also wanted to post this post. But it is going to be very funny if it turns out that LLMs are partially very energy inefficient but very data efficient storage systems. Shannon would be pleased for us reaching the theoretical minimum of bits per char of words using AI.

      • sinedpick@awful.systems
        link
        fedilink
        English
        arrow-up
        19
        ·
        edit-2
        6 months ago

        huh, I looked into the LLM for compression thing and I found this survey CW: PDF which on the second page has a figure that says there were over 30k publications on using transformers for compression in 2023. Shannon must be so proud.

        edit: never mind it’s just publications on transformers, not compression. My brain is leaking through my ears.

    • FooBarrington@lemmy.world
      link
      fedilink
      English
      arrow-up
      20
      ·
      edit-2
      6 months ago

      I’ll get downvoted for this, but: what exactly is your point? The AI didn’t reproduce the text verbatim, it reproduced the idea. Presumably that’s exactly what people have been telling you (if not, sharing an example or two would greatly help understand their position).

      If those “reply guys” argued something else, feel free to disregard. But it looks to me like you’re arguing against a straw man right now.

      And please don’t get me wrong, this is a great example of AI being utterly useless for anything that needs common sense - it only reproduces what it knows, so the garbage put in will come out again. I’m only focusing on the point you’re trying to make.

        • carlitoscohones@awful.systems
          link
          fedilink
          English
          arrow-up
          16
          ·
          6 months ago

          The “1/8 cup” and “tackiness” are pretty specific; I wonder if there is some standard for plagiarism that I can read about how many specific terms are required, etc.

          Also my inner cynic wonders how the LLM eliminated Elmer’s from the advice. Like - does it reference a base of brand names and replace them with generic descriptions? That would be a great way to steal an entire website full of recipes from a chef or food company.

        • FooBarrington@lemmy.world
          link
          fedilink
          English
          arrow-up
          8
          ·
          edit-2
          6 months ago

          If your issue with the result is plagiarism, what would have been a non-plagiarizing way to reproduce the information? Should the system not have reproduced the information at all? If it shouldn’t reproduce things it learned, what is the system supposed to do?

          Or is the issue that it reproduced an idea that it probably only read once? I’m genuinely not sure, and the original comment doesn’t have much to go on.

          • aio@awful.systems
            link
            fedilink
            English
            arrow-up
            24
            ·
            edit-2
            6 months ago

            The normal way to reproduce information which can only be found in a specific source would be to cite that source when quoting or paraphrasing it.

            • FooBarrington@lemmy.world
              link
              fedilink
              English
              arrow-up
              4
              ·
              6 months ago

              But the system isn’t designed for that, why would you expect it to do so? Did somebody tell the OP that these systems work by citing a source, and the issue is that it doesn’t do that?

              • 200fifty@awful.systems
                link
                fedilink
                English
                arrow-up
                23
                ·
                6 months ago

                But the system isn’t designed for that, why would you expect it to do so?

                It, uh… sounds like the flaw is in the design of the system, then? If the system is designed in such a way that it can’t help but do unethical things, then maybe the system is not good to have.

              • aio@awful.systems
                link
                fedilink
                English
                arrow-up
                20
                ·
                edit-2
                6 months ago

                “[massive deficiency] isn’t a flaw of the program because it’s designed to have that deficiency”

                it is a problem that it plagiarizes, how does saying “it’s designed to plagiarize” help???

                • froztbyte@awful.systems
                  link
                  fedilink
                  English
                  arrow-up
                  17
                  ·
                  6 months ago

                  “the murdermachine can’t help but murdering. alas, what can we do. guess we just have to resign ourselves to being murdered” says murdermachine sponsor/advertiser/creator/…

                • FooBarrington@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  5
                  ·
                  6 months ago

                  Please stop projecting positions onto me that I don’t hold. If what people told the OP was that LLMs don’t plagiarize, then great, that’s a different argument from what I described in my reply, thank you for the answer. But you could try not being a dick about it?

      • trollbearpig@lemmy.world
        link
        fedilink
        English
        arrow-up
        22
        ·
        edit-2
        6 months ago

        Come on man. This is exactly what we have been saying all the time. These “AIs” are not creating novel text or ideas. They are just regurgitating back the text they get in similar contexts. It’s just they don’t repeat things vebatim because they use statistics to predict the next word. And guess what, that’s plagiarism by any real world standard you pick, no matter what tech scammers keep saying. The fact that laws haven’t catched up doesn’t change the reality of mass plagiarism we are seeing …

        And people like you keep insisting that “AIs” are stealing ideas, not verbatim copies of the words like that makes it ok. Except LLMs have no concept of ideas, and you people keep repeating that even when shown evidence, like this post, that they don’t think. And even if they did, repeat with me, this is still plagiarism even if this was done by a human. Stop excusing the big tech companies man

          • self@awful.systems
            link
            fedilink
            English
            arrow-up
            14
            ·
            6 months ago

            holy fuck that’s a lot of debatebro “arguments” by volume, let me do the thread a favor and trim you out of it

          • trollbearpig@lemmy.world
            link
            fedilink
            English
            arrow-up
            12
            ·
            edit-2
            6 months ago

            First of all man, chill lol. Second of all, nice way to project here, I’m saying that the “AIs” are overhyped, and they are being used to justify rampant plagiarism by Microsoft (OpenAI), Google, Meta and the like. This is not the same as me saying the technology is useless, though hobestly I only use LLMs for autocomplete when coding, and even then is meh.

            And third dude, what makes you think we have to prove to you that AI is dumb? Way to shift the burden of proof lol. You are the ones saying that LLMs, which look nothing like a human brain at all, are somehow another way to solve the hard problem of mind hahahaha. Come on man, you are the ones that need to provide proof if you are going to make such wild claim. Your entire post is “you can’t prove that LLMs don’t think”. And yeah, I can’t prove a negative. Doesn’t mean you are right though.

    • deweydecibel@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      ·
      6 months ago

      reply guys surfing in from elsewhere

      I love this term.

      They really do love storming in anywhere someone deigns to besmirch the new object of their devotion.

      My assumption is, if it isn’t some techbro that drank the kool aid, it’s a bunch of /r/wallstreetbets assholes who have invested in the boom.

  • Adderbox76@lemmy.ca
    link
    fedilink
    English
    arrow-up
    122
    ·
    6 months ago

    Feed an A.I. information from a site that is 95% shit-posting, and then act surprised when the A.I. becomes a shit-poster… What a time to be alive.

    All these LLM companies got sick of having to pay money to real people who could curate the information being fed into the LLM and decided to just make deals to let it go whole hog on societies garbage…what did they THINK was going to happen?

    The phrase garbage in, garbage out springs to mind.

  • nednobbins@lemm.ee
    link
    fedilink
    English
    arrow-up
    82
    ·
    6 months ago

    This is why actual AI researchers are so concerned about data quality.

    Modern AIs need a ton of data and it needs to be good data. That really shouldn’t surprise anyone.

    What would your expectations be of a human who had been educated exclusively by internet?

      • blakestacey@awful.systems
        link
        fedilink
        English
        arrow-up
        46
        ·
        6 months ago

        To date, the largest working nuclear reactor constructed entirely of cheese is the 160 MWe Unit 1 reactor of the French nuclear plant École nationale de technologie supérieure (ENTS).

        “That’s it! Gromit, we’ll make the reactor out of cheese!

      • nednobbins@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 months ago

        A bunch of scientific papers are probably better data than a bunch of Reddit posts and it’s still not good enough.

        Consider the task we’re asking the AI to do. If you want a human to be able to correctly answer questions across a wide array of scientific fields you can’t just hand them all the science papers and expect them to be able to understand it. Even if we restrict it to a single narrow field of research we expect that person to have a insane levels of education. We’re talking 12 years of primary education, 4 years as an undergraduate and 4 more years doing their PhD, and that’s at the low end. During all that time the human is constantly ingesting data through their senses and they’re getting constant training in the form of feedback.

        All the scientific papers in the world don’t even come close to an education like that, when it comes to data quality.

        • self@awful.systems
          link
          fedilink
          English
          arrow-up
          6
          ·
          6 months ago

          this appears to be a long-winded route to the nonsense claim that LLMs could be better and/or sentient if only we could give them robot bodies and raise them like people, and judging by your post history long-winded debate bullshit is nothing new for you, so I’m gonna spare us any more of your shit

    • DarkThoughts@fedia.io
      link
      fedilink
      arrow-up
      29
      ·
      6 months ago

      Honestly, no. What “AI” needs is people better understanding how it actually works. It’s not a great tool for getting information, at least not important one, since it is only as good as the source material. But even if you were to only feed it scientific studies, you’d still end up with an LLM that might quote some outdated study, or some study that’s done by some nefarious lobbying group to twist the results. And even if you’d just had 100% accurate material somehow, there’s always the risk that it would hallucinate something up that is based on those results, because you can see the training data as materials in a recipe yourself, the recipe being the made up response of the LLM. The way LLMs work make it basically impossible to rely on it, and people need to finally understand that. If you want to use it for serious work, you always have to fact check it.

      • nednobbins@lemm.ee
        link
        fedilink
        English
        arrow-up
        11
        ·
        6 months ago

        That’s my point. Some of them wouldn’t even go through the trouble of making sure that it’s non-toxic glue.

        There are humans out there who ate laundry pods because the internet told them to.

      • samus12345@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        6 months ago

        I guess it would have to be be default, since only older millennials and up can remember a time before internet.

        • skillissuer@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          8
          ·
          edit-2
          6 months ago

          not everyone is a westerner you know

          my village didn’t get any kind of internet, even dialup until like 2009, i remember pre-internet and i still don’t have mortgage

          e: now that i’m thinking ADSL was a thing for maybe a year or two, but it was expensive and never really caught on. the first real internet experience™ was delivered by a sketchy point to point radiolink that dropped every time it rained. much later it was all replaced by FTTH paid for by EU money

          • froztbyte@awful.systems
            link
            fedilink
            English
            arrow-up
            4
            ·
            edit-2
            6 months ago

            heh yeah

            I had a pretty weird arc. I got to experience internet really early (‘93~94), and it took until ‘99+ for me to have my first “regular” access (was 56k on airtime-equiv landline). it took until ‘06 before I finally had a reliable recurrent connection

            I remember seeing mentions (and downloads for) eggdrops years before I had any idea of what they were for/could do

            (and here I am building ISPs and shit….)

        • 𝓔𝓶𝓶𝓲𝓮@lemm.ee
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          6 months ago

          Lies. Internet at first was just some mystical place accessed by expensive service. So even if it already existed it wasn’t full of twitter fake news etc as we know it. At most you had a peer to peer chat service and some weird class forum made by that one class nerd up until like 2006

            • 𝓔𝓶𝓶𝓲𝓮@lemm.ee
              link
              fedilink
              English
              arrow-up
              3
              ·
              edit-2
              6 months ago

              I wasn’t a nerd back then frankly. I mean it wasn’t good look for surviving the school. The only one was bullied like fuck

              • flere-imsaho@awful.systems
                link
                fedilink
                English
                arrow-up
                6
                ·
                6 months ago

                ah. well, my commiserations, the us seems to thrive on pitting people against each other.

                anyways, my point is that usenet had every type of crank you can see these days on twitter. this is not new.

                • 𝓔𝓶𝓶𝓲𝓮@lemm.ee
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  edit-2
                  6 months ago

                  Well probably but what’s the point if some extremely small minority used it?

                  The point with iPad kids is that it is so common. The kids played outside and stuff well into 2000s.

                  Still I guess iPads are better than dxm tabs but as the old wisdom says: why not both?

          • froztbyte@awful.systems
            link
            fedilink
            English
            arrow-up
            4
            ·
            6 months ago

            reading your post gave me multiple kinds of whiplash

            are you, like, aware of the fact that there can be different ways experiences? for other people? that didn’t match whatever you went through?

      • nednobbins@lemm.ee
        link
        fedilink
        English
        arrow-up
        3
        ·
        6 months ago

        Haha. Not specifically.

        It’s more a comment on how hard it is to separate truth from fiction. Adding glue to pizza is obviously dumb to any normal human. Sometimes the obviously dumb answer is actually the correct one though. Semmelweis’s contemporaries lambasted him for his stupid and obviously nonsensical claims about doctors contaminating pregnant women with “cadaveric particles” after performing autopsies.

        Those were experts in the field and they were unable to guess the correctness of the claim. Why would we expect normal people or AIs to do better?

        There may be a time when we can reasonably have such an expectation. I don’t think it will happen before we can give AIs training that’s as good as, or better, than what we give the most educated humans. Reading all of Reddit, doesn’t even come close to that.

  • dumbass@leminal.space
    link
    fedilink
    English
    arrow-up
    81
    ·
    6 months ago

    Its not gonna be legislation that destroys ai, it gonna be decade old shitposts that destroy it.

  • CileTheSane@lemmy.ca
    link
    fedilink
    English
    arrow-up
    74
    ·
    6 months ago

    Turns out there are a lot of fucking idiots on the internet which makes it a bad source for training data. How could we have possibly known?

    • Kit
      link
      fedilink
      English
      arrow-up
      47
      ·
      6 months ago

      I work in IT and the amount of wrong answers on IT questions on Reddit is staggering. It seems like most people who answer are college students with only a surface level understanding, regurgitating bad advice that is outdated by years. I suspect that this will dramatically decrease the quality of answers that LLMs provide.

      • WhatIsH2O4@lemmy.ml
        link
        fedilink
        English
        arrow-up
        16
        ·
        6 months ago

        It’s often the same for science, though there are actual experts who occasionally weigh in too.

          • Ragnarok314159@sopuli.xyz
            link
            fedilink
            English
            arrow-up
            8
            ·
            6 months ago

            Not really. A lot of surface level correct, but deeply wrong answers, get upvotes on Reddit. It’s a lot of people seeing it and “oh, I knew that!” discourse.

            Like when Reddit was all suddenly experts on CFD and Fluid Dynamics because they knew what a video of laminar flow was.

            • Joe Cool@lemmy.ml
              link
              fedilink
              English
              arrow-up
              5
              ·
              6 months ago

              That’s what I meant. I have seen actual M.D.s being downvoted even after providing proof of their profession. Just because they told people what they didn’t want to hear.
              I guess that’s human nature.

              • Ragnarok314159@sopuli.xyz
                link
                fedilink
                English
                arrow-up
                6
                ·
                6 months ago

                I get you. Didn’t mean to come across as a “that guy”. So completely agree with you. The laminar flow Reddit shit infuriated me because I have my masters in Mech Eng and used to do a lot of CFD. People were talking out of their ass on “I know laminar flow!”

                Well, see, it’s more than that. It’s not just a visual thing and…

                “Ahhhh! I know laminar flow! Downvote the heretic!”

        • TheOakTree@beehaw.org
          link
          fedilink
          English
          arrow-up
          11
          ·
          6 months ago

          My least favorite is when people claim a deep understanding while only having a surface-level understanding. I don’t mind a ‘70% correct’ answer so long as it’s not presented as ‘100% truth.’

      • Ragnarok314159@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        6
        ·
        6 months ago

        I was able to delete most of the engineering/science questions on Reddit I answered before they permabanned my account. I didn’t want my stuff used for their bullshit. Fuck Reddit.

        I don’t mind answering another human and have other people read it, but training AI just seemed like a step too far.

  • Kerb@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    55
    ·
    6 months ago

    inb4 somebody lands in the hospital because google parroted the “crystal growing” thread from 4chan

  • Xer0@lemmy.ml
    link
    fedilink
    English
    arrow-up
    40
    ·
    6 months ago

    This shit is fucking hilarious. Couldn’t have come from a better username either: Fucksmith lmao

  • David Gerard@awful.systemsM
    link
    fedilink
    English
    arrow-up
    37
    ·
    6 months ago

    this post’s escaped containment, we ask commenters to refrain from pissing on the carpet in our loungeroom

  • Aceticon@lemmy.world
    link
    fedilink
    English
    arrow-up
    36
    ·
    edit-2
    6 months ago

    “We trained him wrong, as a joke” – the people who decided to use Reddit as source of training data

    • Obi@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      16
      ·
      6 months ago

      Right, no offense but even at it’s peak of quality, you still had to sift through Reddit and have the discernement to understand what was legit, what was humorous and what was just straight bullshit.

  • lingh0e@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    36
    ·
    6 months ago

    Jesus christ. Shittymorph and jackdaws are gonna be in SO MANY history reports in the future. We’re doomed as a species.

    • Knock_Knock_Lemmy_In@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      ·
      6 months ago

      I’ve been asking Gemini a few questions, gradually building up the complexity of the prompt until back in nineteen ninety eight the undertaker threw mankind off hell in a cell and plummeted sixteen feet through an announcers table.

  • Klanky@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    34
    ·
    6 months ago

    I am assuming there is a clause somewhere that limits their liability? This kind of stuff seems like a lawsuit waiting to happen.

    • froztbyte@awful.systems
      link
      fedilink
      English
      arrow-up
      27
      ·
      6 months ago

      ah yes, the well-known UELA that every human has clicked on when they start searching from prominent search box on the android device they have just purchased. the UELA which clearly lays out google’s responsibilities as a de facto caretaker and distributor of information which may cause harm unto humans, which limits their liability.

      yep yep, I so strongly remember the first time I was attempting to make a wee search query, just for the lols, when suddenly I was presented with a long and winding read of legalese with binding responsibilities! oh, what a world.

      …no, wait. it’s the other one.

      • Ech@lemm.ee
        link
        fedilink
        English
        arrow-up
        11
        ·
        6 months ago

        It’s EULA (End-User License Agreement), just fyi.

      • 200fifty@awful.systems
        link
        fedilink
        English
        arrow-up
        10
        ·
        edit-2
        6 months ago

        I mean they do throw up a lot of legal garbage at you when you set stuff up, I’m pretty sure you technically do have to agree to a bunch of EULAs before you can use your phone.

        I have to wonder though if the fact Google is generating this text themselves rather than just showing text from other sources means they might actually have to face some consequences in cases where the information they provide ends up hurting people. Like, does Section 230 protect websites from the consequences of just outright lying to their users? And if so, um… why does it do that?

        Even if a computer generated the text, I feel like there ought to be some recourse there, because the alternative seems bad. I don’t actually know anything about the law, though.

        • blakestacey@awful.systems
          link
          fedilink
          English
          arrow-up
          8
          ·
          6 months ago

          I have to wonder though if the fact Google is generating this text themselves rather than just showing text from other sources means they might actually have to face some consequences in cases where the information they provide ends up hurting people.

          Darn good question. Of course, since Congress is thirsty to destroy Section 230 in the delusional belief that this will make Google and Facebook behave without hurting small websites that lack massive legal departments (cough fedi instances)…

          • 200fifty@awful.systems
            link
            fedilink
            English
            arrow-up
            6
            ·
            edit-2
            6 months ago

            Truth be told, I’m not a huge fan of the sort of libertarian argument in the linked article (not sure how well “we don’t need regulations! the market will punish websites that host bad actors via advertisers leaving!” has borne out in practice – glances at Facebook’s half of the advertising duopoly), and smaller communities do notably have the property of being much easier to moderate and remove questionable things compared to billion-user social websites where the sheer scale makes things impractical. Given that, I feel like the fediverse model of “a bunch of little individually-moderated websites that can talk to each other” could actually benefit in such a regulatory environment.

            But, obviously the actual root cause of the issue is platforms being allowed to grow to insane sizes and monopolize everything in the first place (not very useful to make them liable if they have infinite money and can just eat the cost of litigation), and to put it lightly I’m not sure “make websites more beholden to insane state laws” is a great solution to the things that are actually problems anyway :/

            • blakestacey@awful.systems
              link
              fedilink
              English
              arrow-up
              7
              ·
              edit-2
              6 months ago

              All it takes is one frivolous legal threat to shut down a small website by putting them on the hook for legal costs they can’t afford. Facebook gets away with awful shit not because of the law, but because they are stupidly rich. Change the law, and they will still be stupidly rich. Indeed, the “sunset Section 230” path will make it open season for Facebook’s lobbyists to pay for the replacement law that they want. I do not see that leading anywhere good.

        • froztbyte@awful.systems
          link
          fedilink
          English
          arrow-up
          7
          ·
          edit-2
          6 months ago

          legal garbage at you when you set stuff up,

          for phone setup, yeah fair 'nuff, but even that is well-arguable (what about corp phones where some desk jockey or auto-ack script just clicked yes on all the prompts and choices?)

          a perhaps simpler case is “this browser was set to google as a shipped default”. afaik in literally no case of “you’ve just landed here, person unknown, start searching ahoy!” does google provide you with a T&Cs prompt or anything

          I have to wonder though if the fact Google is generating this text themselves rather than just showing text…

          indeed! aiui there’s a slow-boil legal thing happening around this, as to whether such items are considered derivative works, and what the other leg of it may end up being. I did see one thing that I think seemed categorically define that they can’t be “individual works” (because no actual human labour was involved in any one such specific answer, they’re all automatic synthetic derivatives), but I speak under correction because the last few years have been a shitshow and I might be misremembering

          in a slightly wider sense of interpretation wrt computer-generated decisions, I believe even that is still case-by-case determined, since in the fields of auto-denied insurance and account approvals and and and, I don’t know of any current legislation anywhere that takes a broad-stroke approach to definitions and guarantees. will be nice when it comes to pass, though. and I suspect all the genmls are going to get the short end of the stick.*

          (* in fact: I strongly suspect that they know this is extremely likely, and that this awareness is a strong driver in why they’re now pulling all the shit and pushing all the boundaries they can. knowing that once they already have that ground, it’ll take work to knock them back)