As far as I know, the Deepmind paper was actually a challenge of the OpenAI paper, suggesting that
models are undertrained and underperform while using too much compute due to this. They tested
a model with 70B params and were able to outperform much larger models while using less compute by
introducing more training. I don’t think there can be any general conclusion about some hard
ceiling for LLM performance drawn from this.
However, this does not change the fact that there are areas (ones that rely on correctness)
that simply cannot be replaced by this kind of model, and it is a foolish pursuit.
As far as I know, the Deepmind paper was actually a challenge of the OpenAI paper, suggesting that models are undertrained and underperform while using too much compute due to this. They tested a model with 70B params and were able to outperform much larger models while using less compute by introducing more training. I don’t think there can be any general conclusion about some hard ceiling for LLM performance drawn from this.
However, this does not change the fact that there are areas (ones that rely on correctness) that simply cannot be replaced by this kind of model, and it is a foolish pursuit.