Wikipedia is under assault: rogue users keep posting AI generated nonsense

ForgottenFlux@lemmy.world · edit-2 4 months ago

Wikipedia is under assault: rogue users keep posting AI generated nonsense

narc0tic_bird@lemm.ee · 4 months ago

Best case is that the model used to generate this content was originally trained by data from Wikipedia so it “just” generates a worse, hallucinated “variant” of the original information. Goes to show how stupid this idea is.

Imagine this in a loop: AI trained by Wikipedia that then alters content on Wikipedia, which in turn gets picked up by the next model trained. It would just get worse and worse, similar to how converting the same video over and over again yields continuously worse results.

huginn@feddit.it · 4 months ago

See also: model collapse

(Which is more or less just regression towards the mean with more steps)

Captain Aggravated@sh.itjust.works · 4 months ago

Eventually every article just reads “Delve delve delve delve delve delve delve.”

8uurg@lemmy.world · 4 months ago

A very similar situation to that analysed in this paper that was recently published. The quality of what is generated degrades significantly.

Although they mostly investigate replacing the data with ai generated data in each step, so I doubt the effect will be as pronounced in practice. Human writing will still be included and even curation of ai generated text by people can skew the distribution of the training data (as the process by these editors would inevitably do, as reasonable text could get through the cracks.)

Blaster M@lemmy.world · edit-2 4 months ago

AI model makers are very well aware of this and there is a move from ingesting everything to curating datasets more aggressively. Data prep is something many upstarts have no idea is critical, but everyone is learning about, sometimes the hard way.

Zorque@lemmy.world · 4 months ago

Every article would end up being the philosophy page.