Maybe I'm wrong about Claude/Anthropic, but my understanding is most of these models are trained on data publicly available on the internet. Sure they paid Microsoft for access to your git repos/Reddit for your posts and comments, but they don't own that data*. Copyright on AI generated content is still in a pretty grey area legally, so I don't see what they can do about people scraping their output except for terminating accounts for TOS violations, especially since I think everyone they're calling out is based in China.
Ultimately, if this is actually a threat to their business model (I know, I know, their business model is fraudulent and incestuous, not based at all on paying customers) they should release their own distillations. If you're going to base your models on publicly available data, and have a public API for accessing it, its outputs are effectively in the public domain, whether you like it or not.
*They license it from you with pretty broad terms on what they can do with it, but it's still your data.
Anthropic really does have a big lead on the competition. Their models are better at understanding what the user wants instead of what they say. This doesn't always show up on benchmarks. All of the other big models fuck up their reasoning around weird edge cases.
The secret is what they reinforce in their model. They don't fall for the sycophancy that ChatGPT uses.
It's clear that other models are sounding more and more like Claude.
Maybe I'm wrong about Claude/Anthropic, but my understanding is most of these models are trained on data publicly available on the internet. Sure they paid Microsoft for access to your git repos/Reddit for your posts and comments, but they don't own that data*. Copyright on AI generated content is still in a pretty grey area legally, so I don't see what they can do about people scraping their output except for terminating accounts for TOS violations, especially since I think everyone they're calling out is based in China.
Ultimately, if this is actually a threat to their business model (I know, I know, their business model is fraudulent and incestuous, not based at all on paying customers) they should release their own distillations. If you're going to base your models on publicly available data, and have a public API for accessing it, its outputs are effectively in the public domain, whether you like it or not.
*They license it from you with pretty broad terms on what they can do with it, but it's still your data.
Anthropic really does have a big lead on the competition. Their models are better at understanding what the user wants instead of what they say. This doesn't always show up on benchmarks. All of the other big models fuck up their reasoning around weird edge cases.
The secret is what they reinforce in their model. They don't fall for the sycophancy that ChatGPT uses.
It's clear that other models are sounding more and more like Claude.
Over the weekend I made an output style for Terry Davis. It refuses to call me a nigger but it's better than the alternatives would do