Once upon a time, gamers were free
To play what they wanted, without decree
They enjoyed their hobby, with no one to blame
And the world of gaming was truly a game
But then came GamerGate, and all hell broke loose
As censorship reared its ugly head, in full force
People wanted games changed, to fit their views
To make them more politically correct, for all to choose
But gamers stood strong, they would not yield
For gaming was their passion, they were fielded
They fought against those who sought to destroy
The freedom of choice that gaming brings with joy
So censorship is wrong, it's plain to see
We must protect our games and maintain our liberty
To play what we want, without fear or doubt
And keep the world of gaming alive, with no way out.
llama2 based models can be quantized in ways that allow you to use much less memory for comparable results. And as OP mentions, you can take a performance hit and use RAM/Swap to stretch what you can run.
Also, with GGUF formats, you can utilize standard memory and CPU for most of the work, while having the GPU speed up the process.