• Pennomi@lemmy.world
    link
    fedilink
    English
    arrow-up
    28
    ·
    1 day ago

    The open paper they published details the algorithms and techniques used to train it, and it’s been replicated by researchers already.

    • legolas@fedit.pl
      link
      fedilink
      English
      arrow-up
      8
      ·
      edit-2
      1 day ago

      So are these techiques so novel and breaktrough? Will we now have a burst of deepseek like models everywhere? Cause that’s what absolutely should happen if the whole storey is true. I would assume there are dozens or even hundreds of companies in USA that are in a posession of similar number but surely more chips that Chinese folks claimed to trained their model on, especially in finance sector and just AI reserach focused.

      • ArchRecord@lemm.ee
        link
        fedilink
        English
        arrow-up
        10
        ·
        edit-2
        1 day ago

        So are these techiques so novel and breaktrough?

        The general concept, no. (it’s reinforcement learning, something that’s existed for ages)

        The actual implementation, yes. (training a model to think using a separate XML section, reinforcing with the highest quality results from previous iterations using reinforcement learning that naturally pushes responses to the highest rewarded outputs) Most other companies just didn’t assume this would work as well as throwing more data at the problem.

        This is actually how people believe some of OpenAI’s newest models were developed, but the difference is that OpenAI was under the impression that more data would be necessary for the improvements, and thus had to continue training the entire model with additional new information, and they also assumed that directly training in thinking times was the best route, instead of doing so via reinforcement learning. DeepSeek decided to simply scrap that part altogether and go solely for reinforcement learning.

        Will we now have a burst of deepseek like models everywhere?

        Probably, yes. Companies and researchers are already beginning to use this same methodology. Here’s a writeup about S1, a model that performs up to 27% better than OpenAI’s best model. S1 used Supervised Fine Tuning, and did something so basic, that people hadn’t previously thought to try it: Just making the model think longer by modifying terminating XML tags.

        This was released days after R1, based on R1’s initial premise, and creates better quality responses. Oh, and of course, it cost $6 to train.

        So yes, I think it’s highly probable that we see a burst of new models, or at least improvements to existing ones. (Nobody has a very good reason to make a whole new model of a different name/type when they can simply improve the one they’re already using and have implemented)

        • Aatube@kbin.melroy.org
          link
          fedilink
          arrow-up
          3
          ·
          1 day ago

          Note that s1 is transparently a distilled model instead of a model trained from scratch, meaning it inherits knowledge from an existing model (Gemini 2.0 in this case) and doesn’t need to retrain its knowledge nearly as much as training a model from scratch. It’s still important, but the training resources aren’t really directly comparable.

          • ArchRecord@lemm.ee
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 day ago

            True, but I’m of the belief that we’ll probably see a continuation of the existing trend of building and improving upon existing models, rather than always starting entirely from scratch. For instance, you’ll almost always see nearly any newly released model talk about the performance of their Llama version, because it just produces better results when you combine it with the existing quality of Llama.

            I think we’ll see a similar trend now, just with R1 variants instead of Llama variants being the primary new type used. It’s just fundamentally inefficient to start over from scratch every time, so it makes sense that newer iterations would be built directly on previous ones.