• Fisch@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 month ago

    Being able to run 7B quality models on your phone would be wild. It would also make it possible to run those models on my server (which is just a mini pc), so I could connect it to my Home Assistant voice assistant, which would be really cool.

    • Smorty [she/her]
      link
      fedilink
      arrow-up
      1
      ·
      1 month ago

      Something similar to this already kinda exists on HF with the 1.58 bit quantisation which seem to get very similar performance to the original Llama 3 8B model. That’s essentially a two bit quanitsation with reasonable performance!