• IninewCrow@lemmy.ca
    link
    fedilink
    English
    arrow-up
    104
    ·
    edit-2
    2 months ago

    How many of these books will just be totally garbage nonsense just so they could fulfill a prearranged quota.

    Now the LLM are filled with a good amount of nonsense.

      • runner_g
        link
        fedilink
        English
        arrow-up
        25
        ·
        2 months ago

        Someone’s probably already coined the term, but I’m going to call it LLM inbreeding.

        • adr1an@programming.dev
          link
          fedilink
          English
          arrow-up
          4
          ·
          2 months ago

          In computer science, garbage in, garbage out (GIGO) is the concept that flawed, biased or poor quality (“garbage”) information or input produces a result or output of similar (“garbage”) quality. The adage points to the need to improve data quality in, for example, programming.

          There was some research article applying this 70s computer science concept to LLMs. It was published in Nature and hit major news outlets. Basically they further trained GPT on its output for a couple generations, until the model degraded terribly. Sounded obvious to me, but seeing it happen on the www is painful nonetheless…

        • Benn@lemm.ee
          link
          fedilink
          English
          arrow-up
          3
          ·
          2 months ago

          It’s quite similar to another situation known as data incest

    • Poplar?@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 months ago

      That would be terrible because they are both some of the best academic publishers in the humanities.