Companies are training LLMs on all the data that they can find, but this data is not the world, but discourse about the world. The rank-and-file developers at these companies, in their naivete, do not see that distinction…So, as these LLMs become increasingly but asymptotically fluent, tantalizingly close to accuracy but ultimately incomplete, developers complain that they are short on data. They have their general purpose computer program, and if they only had the entire world in data form to shove into it, then it would be complete.

  • AbouBenAdhem@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    ·
    6 months ago

    this data is not the world, but discourse about the world

    To be fair, the things most people talk about are things they’ve read or heard of, not their own direct personal experiences. We’ve all been putting our faith in the accuracy of this “discourse about the world”, long before LLMs came along.

    • FaceDeer@fedia.io
      link
      fedilink
      arrow-up
      6
      ·
      6 months ago

      Indeed. I’ve never been to Australia. I’ve never even left the continent I was born on. I am reasonably sure it exists, though, based on all the second-hand data that I’ve seen. I even know a fair bit about stuff you can find there, like the Crow Fishers and the Bullet Farm and the Sugartown Cabaret.

      • afraid_of_zombies@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 months ago

        If you are interested there is no direct evidence that Shakespeare ever went to Italy, but he knew plenty of people who did, and travel guides were popular at the time. 13 of his plays are at least partially set in Italy. So about 1/3rd.

        Pretty impressive.