Anthropic—Accused of Stealing Data to Create Their Product—Accuses Others of Stealing Data to Create Their Product
I see someone brought stones to the glass house again
Okay everybody knows that there’s a big IP ownership problem around AI and their training data. We’re not going to litigate that issue here, though I do firmly believe it will be worked out one way or another.
So, here’s the funny part: Anthropic—one of the companies advocating for the position that everything on the internet is free training data no matter what—is complaining that other AI labs are distilling Claude. These companies are being accused of firing off a ton of queries to Claude to, in essence, “reverse engineer” Claude.
Anthropic states that there are alignment risks with distillation—that there’s no guarantee that there’s any kind of safety precautions with these distilled models, and—while that’s certainly a valid risk—kinda flies in the face of their standpoint on the internet is free training data. Claude’s on the internet. People have raised valid concerns over the alignment of Claude and its methodologies when it comes to collecting the training data that Anthropic decided to ignore in order to build its models.
We’re not going get into the rather complicated world that is copyright law and the legality of reverse engineering or anything like that, we’ll leave that to the people who studied that. But, like, if you’re going to go around saying the world’s your oyster when it comes to training data, don’t get mad when someone else comes around to shuck it.