• Pup Biru@aussie.zone
    link
    fedilink
    English
    arrow-up
    4
    ·
    2 days ago

    it’s actually pretty easy to run locally as well. obviously not as easy as just downloading an app, but it’s gotten relatively straight-forward and the peace of mind is nice

    check out ollama, and find an ollama UI

    • MagicShel@lemmy.zip
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      That’s not the monster model, though. But yes, I run AI locally (barely on my 1660). What I can run locally is pretty decent in limited ways, but I want to see the o1 competitor.

      • Pup Biru@aussie.zone
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 day ago

        figured i’d do this in a no comment since it’s been a bit since my last, but i just downloaded and ran the 70b model on my mac and it’s slower but running fine: 15s to first word, and about half as fast generating words after that but it’s running

        this matches with what i’ve experienced with other models too: very large models still run; just much much slower

        i’m not sure of things when it gets up to 168b model etc, because i haven’t tried but it seems that it just can’t load the whole model at once and there’s just a lot more loading and unloading which makes it much slower

        • MagicShel@lemmy.zip
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 day ago

          You can look at the stats on how much of the model fits in vram. The lower the percentage the slower it goes although I imagine that’s not the only constraint. Some models probably are faster than others regardless, but I really have not done a lot of experimenting. Too slow on my card to really even compare output quality across models. Once I have 2k tokens in context, even a 7B model is a token every second or more. I have about the slowest card that ollama even says you says use. I think there is one worse card.

          ETA: I’m pulling the 14B Abliterated model now for testing. I haven’t had good luck running a model this big before, but I’ll let you know how it goes.

      • Pup Biru@aussie.zone
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 day ago

        that’s true - i was running 7b and it seemed pretty much instant, so was assuming i could do much larger - turns out only 14b on a 64gb mac