They might try, but if their goal was to destabilizing western dominance for LLMs making it completely open source was the best way.
This isn’t like TikTok. They have a server that hosts it, but anyone can take their model and run it and there are going to be a lot of us companies besides the big Ai ones looking at it. Even the big Ai ones will likely try to adapt the stuff they’ve spent to long brute forcing to get improvement.
The thing is, it’s less about the actual model and more about the method. It does not take anywhere close to as many resources to train models like deepseek compared to what companies in the US have been doing. It means that there is no longer going to be just a small group hording the tech and charging absurd amounts for it.
Running the model can be no more taxing than playing a modern video game, except the load is not constant.
The cat is out of the bag. They could theoretically ban the direct models released from the research team, but retrained variants are going to be hard to differentiate from scratch models. And the original model is all over the place and have had people hacking away at it.
Blocking access to their hosted service right now would just be petty, but I do expect that from the current administration…
Running the model can be no more taxing than playing a modern video game, except the load is not constant.
This is not true, Deepseek R1 is huge. There’s a lot of confusion between the smaller distillations based on Qwen 2.5 (some that can run on consumer GPUs), and the “full” Deepseek R1 based on Deepseekv3
Your point mostly stands, but the “full” model is hundreds of gigabytes, and the paper mentioned something like a bank of 370 GPUs being optimal for hosting. It’s very efficient because its only like 30B active, which is bonkers, but still.
They might try, but if their goal was to destabilizing western dominance for LLMs making it completely open source was the best way.
This isn’t like TikTok. They have a server that hosts it, but anyone can take their model and run it and there are going to be a lot of us companies besides the big Ai ones looking at it. Even the big Ai ones will likely try to adapt the stuff they’ve spent to long brute forcing to get improvement.
The thing is, it’s less about the actual model and more about the method. It does not take anywhere close to as many resources to train models like deepseek compared to what companies in the US have been doing. It means that there is no longer going to be just a small group hording the tech and charging absurd amounts for it.
Running the model can be no more taxing than playing a modern video game, except the load is not constant.
The cat is out of the bag. They could theoretically ban the direct models released from the research team, but retrained variants are going to be hard to differentiate from scratch models. And the original model is all over the place and have had people hacking away at it.
Blocking access to their hosted service right now would just be petty, but I do expect that from the current administration…
This is not true, Deepseek R1 is huge. There’s a lot of confusion between the smaller distillations based on Qwen 2.5 (some that can run on consumer GPUs), and the “full” Deepseek R1 based on Deepseekv3
Your point mostly stands, but the “full” model is hundreds of gigabytes, and the paper mentioned something like a bank of 370 GPUs being optimal for hosting. It’s very efficient because its only like 30B active, which is bonkers, but still.