“They can take a really good, big model and use a process called distillation,” said Benchmark General Partner Chetan Puttagunta. “Basically you use a very large model to help your small model get smart at the thing you want it to get smart at. That’s actually very cost-efficient.”
That's just the small "distilled" models that you can actually run locally on your own GPU. They used outputs from the larger DeepSeek R1 model for which you need more than 700GB of VRAM.
They used a larger LLM to train theirs.
Puttagunta my head with a name like that. Another goddamn Indian!!!
So in essence, their AI is stealing the data from other AI. How very China.
That's just the small "distilled" models that you can actually run locally on your own GPU. They used outputs from the larger DeepSeek R1 model for which you need more than 700GB of VRAM.
Is that the one that answered someone with "My name is Claude"?