A new paper suggests diminishing returns from larger and larger generative AI models. Dr Mike Pound discusses.

The Paper (No “Zero-Shot” Without Exponential Data): https://arxiv.org/abs/2404.04125

  • bamboo@lemm.ee
    link
    fedilink
    English
    arrow-up
    30
    ·
    edit-2
    7 months ago

    I think it’s incredibly naïve to think that because we’ve hit a boundary on one particular aspect of LLMs that the technology has peaked as a whole. There are lots of ways to improve LLMs that aren’t just increasing the parameter size, for example there’s been an uptick in smaller models that are optimized to run on client devices without large GPUs. There is probably a future where we have small 3-7B models that are competitive with today’s best 70B models, but can run in real time on any smartphone. We’ll have larger context windows, allowing LLMs to work on larger problems. And we’ll have better techniques for getting high quality information out of LLMs, there are already adversarial methods where two LLMs hold a debate on a subject that have proven more accurate and comprehensive data is possible. They’ll also continue to be embedded into different places in software that make them more useful, not just like a chatbot that lives in its own world.

    • barsoap@lemm.eeOP
      link
      fedilink
      English
      arrow-up
      35
      ·
      7 months ago

      There are lots of ways to improve LLMs that aren’t just increasing the parameter size

      The paper isn’t about parameter size but the need for exponentially more training data to get a mere linear increase in output performance.

    • magic_lobster_party@kbin.run
      link
      fedilink
      arrow-up
      9
      ·
      7 months ago

      Improvements are made all the time. You can’t feed a very large SVM the same data as transformer networks and expect it to perform the same. Transformers are used because they can more easily learn complicated patterns with less data.

      I think I’ve read somewhere that neural networks with only one hidden layer can theoretically predict anything (if the hidden layer is large enough), but an incredible amount of data is required for it to do so, so it’s not practical.

      Over time other models will be discovered that can make better use of the training data.