New Mistral model is out

The Hobbyist@lemmy.zip · edit-2 8 months ago

New Mistral model is out

Audalin@lemmy.world · edit-2 8 months ago

I thought MoEs had to be loaded entirely in the (V)RAM and the inference speedup was because you only need to use a fraction of layers to compute the next token (but the choice of layers can be different for each token, so you need them all ready; or keep moving data between the disk <-> RAM <-> VRAM and get reduced performance).

New Mistral model is out

New Mistral model is out

x.com