I’d like to self host a large language model, LLM.

I don’t mind if I need a GPU and all that, at least it will be running on my own hardware, and probably even cheaper than the $20 everyone is charging per month.

What LLMs are you self hosting? And what are you using to do it?

  • Smorty [she/her]
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 month ago

    Please try the 4 bit quantisations of the models. They work a bunch faster while eating less RAM.

    Generally you want to use 7B or 8B models on the CPU, since everything above will be hellishly slugish.