Google's new breakthrough algorithm reduces LLM memory requirements by AT LEAST 6 times. Memory price collapse soon? - Kotaku In Action 2

Google's new breakthrough algorithm reduces LLM memory requirements by AT LEAST 6 times. Memory price collapse soon? (www.tomshardware.com)

posted 110 days ago by Mathdebate 110 days ago by Mathdebate +7 / -0

6 comments

6 comments share save hide report block hide replies

Comments (6)

sorted by:

▲ 4 ▼

– BandageBandolier 4 points 110 days ago +4 / -0

Dunno about the memory prices yet. The AI players were already demanding more than there likely was capacity to produce. This may just mean they get to build more of the infrastructure they wanted rather than there being a surplus of memory yet.

permalink save report block reply

▲ 1 ▼

– SophiesBoyfriend 1 point 109 days ago +1 / -0

You’re a mod- why was this post never visible until it was about a day old?

permalink parent save report block reply

▲ 1 ▼

– BandageBandolier 1 point 109 days ago +1 / -0

I approved it as soon as I saw it, at ~12 hours. It's been a busy month. Handshakes need manual approval for top level posts.

permalink parent save report block reply

▲ 1 ▼

– SophiesBoyfriend 1 point 109 days ago +1 / -0

👍

permalink parent save report block reply

▲ 1 ▼

– Mathdebate [S] 1 point 110 days ago +1 / -0

More from the paper itself :

TurboQuant proved it can quantize the key-value cache to just 3 bits without requiring training or fine-tuning and causing any compromise in model accuracy, all while achieving a faster runtime than the original LLMs (Gemma and Mistral). It is exceptionally efficient to implement and incurs negligible runtime overhead. The following plot illustrates the speedup in computing attention logits using TurboQuant: specifically, 4-bit TurboQuant achieves up to 8x performance increase over 32-bit unquantized keys on H100 GPU accelerators.

https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

permalink save report block reply

▲ 2 ▼

– SarcasticRidley 2 points 110 days ago +2 / -0

TurboQuant

This is my quant, my math specialist. Look at his face. Look at his eyes. He's asian!

permalink parent save report block reply