Yes. I run it on koboldcpp on a 5900x with 32 GB of RAM. It's slow (4-8 mins), but it works. If you have a GPU with a ton of VRAM you can run it faster.
Alpacino is one of the better 30B models right now, which are the biggest you can run on a gaming rig without purpose building to support a 65B model. It sacrifices some accuracy to make it a better storyteller.
You can see it here in third place on huggingface's currently ranked LLMs:
Yes. I run it on koboldcpp on a 5900x with 32 GB of RAM. It's slow (4-8 mins), but it works. If you have a GPU with a ton of VRAM you can run it faster.
Alpacino is one of the better 30B models right now, which are the biggest you can run on a gaming rig without purpose building to support a 65B model. It sacrifices some accuracy to make it a better storyteller.
You can see it here in third place on huggingface's currently ranked LLMs:
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
Note that first place is held by a 65B model which is slow and difficult to run.
Cool! Thanks for the information.