Hey I'm also a local LLM enthusiast and I must say that you've provided a very good overview.
I'd like to add that it really isn't too hard to start playing around with them nowadays; usability has improved quite a bit even over the last few months. I like to recommend koboldcpp because I feel like it's much more newbie friendly than ooba (plus I really enjoy its story mode).
Once you have kobold you only need a GGML model and you're good to go, but I guess picking the right model might still be a hurdle for many folks. Myself I have an 8GB card and I like TheBloke's finetunes, so his 4bit quantization of 13B Wizarrd-Vicuna-Uncensored (https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML) works pretty well for me. Looking forward to his finetunes of LLama2 which are just now starting to drop!
Hey I'm also a local LLM enthusiast and I must say that you've provided a very good overview.
I'd like to add that it really isn't too hard to start playing around with them nowadays; usability has improved quite a bit even over the last few months. I like to recommend koboldcpp because I feel like it's much more newbie friendly than ooba (plus I really enjoy its story mode).
Once you have kobold you only need a GGML model and you're good to go, but I guess picking the right model might still be a hurdle for many folks. Myself I have an 8GB card and I like TheBloke's finetunes, so his 4bit quantization of Wizarrd-Vicuna-Uncensored (https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML) works pretty well for me. Looking forward to his finetunes of LLama2 which are just now starting to drop!