• MalReynolds@slrpnk.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    7 months ago

    I see this a lot, but do you really think the big players haven’t backed up the pre-22 datasets? Also, synthetic (LLM generated) data is routinely used in fine tuning to good effect, it’s likely that architectures exist that can happily do primary training on synthetic as well.