Is there any computer program with AI capabilities (the generative ones seen in ChatGPT; onlineText-to-Picture generators, etc.) that is actually standalone? i.e. able to run in a fully offline environment.

As far as I understand, the most popular AI technology rn consists of a bunch of matrix algebra, convolutions and parallel processing of many low-precision floating-point numbers, which works because statistics and absurdly huge datasets. So if any such program existed, how would it even have a reasonable storage size if it needs the dataset?

  • Rhaedas@fedia.io
    link
    fedilink
    arrow-up
    8
    arrow-down
    1
    ·
    5 months ago

    The AI, image, and audio models that can run on a typical PC have all been broken down from originally larger models. How this is done affects what the models can do and the quality, but the open source community has come a long way in making impressive stuff. First question is more hardware - do you have an Nvidia GPU that can support these types of generations? They can be done through CPU alone, but it’s painfully much slower.

    If so, then I would highly recommend looking into Ollama for running AI models (using WSL if you’re using Windows) and ComfyUI for graphical generation. Don’t let the workflow of complicated ComfyUI scare you, starting from the basics with plenty of Youtube help out there it will make sense. As for TTS, there’s a lot of constant “new stuff” out there, but for actual local processing in “real time” (still takes a bit) I have yet to find anything to replace my Coqui TTS copy with Jenny as the model voice. It may take some digging and work to get that together, it’s older and not supported anymore.

    • hendrik@palaver.p3x.de
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      edit-2
      5 months ago

      I don’t think they break them down. For most models the math requires to start at the beginning and train each model individually from ground up.

      But sure, a smaller model generally isn’t as capable as a bigger one. And you can’t train them indefinitly. So for a model series you’ll maybe use the same dataset but feed more into the super big variant and not so much into the tiny one.

      And there is something where you use a big model to generate questions and answers and use them to train a different, small model. And that model will learn to respond like the big one.

      • Rhaedas@fedia.io
        link
        fedilink
        arrow-up
        4
        ·
        5 months ago

        The breaking down I mentioned is the quantization that forms a smaller model from the larger one. I didn’t want to get technical because I don’t understand the math details myself past how to use them. :)

        • hendrik@palaver.p3x.de
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          5 months ago

          Ah, sure. I think a good way to phrase it is to say they lower the precision. That’s basically what they do, convert the high precision numbers to lower precision fomats. That makes the computations easier/faster and the files smaller.

          And it also doesn’t apply to text, audio and images. As far as I know quantization is mainly used with LLMs. It’s also possible with images and audio, but generally they don’t do that. As far as I remember it leads to degradation and distortions pretty fast. There are other methods like pruning used with generative image models. That brings down their size substantially.