Reddit: 'We Are in the Early Stages of Monetizing Our User Base'

rinze@infosec.pub · 1 年前

Reddit: 'We Are in the Early Stages of Monetizing Our User Base'

TheOneCurly@lemm.ee · 1 年前

I wonder what the risks are to including deleted and pre-edited content in training data. Most of the edits are going to be typos and formatting, do you want 2-3 copies of the same message with typos in them for training data? Similarly, deleted comments are mostly nonsense, unhelpful, duplicate, or highly controversial things.

If someone wants to dig through and find individual users to restore that’s one thing, but I don’t think I’d immediately choose to train off of that other data unless I had to.

nutomic@lemmy.ml · 1 年前

It should be very easy to distinguish edits and deletes which were made within a few minutes or hours after writing a comment, from those made months or years later right around the reddit blackout.