• lemmyvore@feddit.nl
    link
    fedilink
    English
    arrow-up
    29
    ·
    edit-2
    7 months ago

    They don’t really have a choice. Classic website search will be useless in the near future because of the rapid rise of LLM-generated pages. Already for some searches 1 out of 3 results is generated crap.

    Their only hope it’s that somehow they’ll be able to weed out LLM pages with LLM. Which is something that scientists say it’s impossible because LLMs cannot learn from LLM results so they won’t be able to reliably tell which content is good.

    The fact they’re even trying this shows they’re desperate, so they will try.

    • wagoner@infosec.pub
      link
      fedilink
      English
      arrow-up
      15
      ·
      7 months ago

      If they can’t direct me to the right web site because they can’t tell what’s LLM junk, then how will they summarize an answer for me based on those same web sites they know about? It doesn’t seem like LLM summaries are a way to avoid that issue at all.

    • QuadratureSurfer@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      7 months ago

      Do you have a source for those scientists you’re referring to?

      I know that LLMs can be trained on data output by other LLMs, but you’re basically diluting your results unless you do a lot of work to clean up the data.

      I wouldn’t say it’s “impossible” to determine if content was generated by an LLM, but I agree that it will not be reliable.

    • Eager Eagle@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      7 months ago

      Well, it’s not exactly impossible because of that, it’s just unlikely they’ll use a discriminator for the task because great part of generated content is effectively indistinguishable from human-written content - either because the model was prompted to avoid “LLM speak”, or because the text was heavily edited. Thus they’d risk a high false positive rate.