This post is for our premium members only. If you are reading this in full, thank you for being an Interconnected Premium member! If you aren’t, I hope you become one by scrolling down and tear down that paywall! 😎
During Christmas week, two noteworthy things happened to me – our son was born and DeepSeek released its newest open source AI model. While I struggled through the art of swaddling a crying baby (a fantastic benchmark for humanoid robots, by the way), AI twitter was lit with discussions about DeepSeek-V3. Everyone chimed in: Andrej Karpathy, Jim Fan, oh, and me! Interestingly, the release was much less discussed in China, while the ex-China world of Twitter/X breathlessly pored over the model’s performance and implication.
Two major things stood out from DeepSeek-V3 that warranted the viral attention it received. First, it is (according to DeepSeek’s benchmarking) as performant or more on a few major benchmarks versus other state of the art models, like Claude 3.5 Sonnet and GPT-4o. Second, it achieved these performances with a training regime that incurred a fraction of the cost that took Meta to train its comparable Llama 3.1 405 billion parameter model. DeepSeek’s training cost roughly $6 million worth of GPU hours, using a cluster of 2048 H800s (the modified version of H100 that Nvidia had to improvise to comply with the first round of US export control only to be banned by the second round of the control). Meta’s training of Llama 3.1 405 used 16,000 H100s and would’ve cost 11-times more than DeepSeek-V3!
Given DeepSeek’s impressive progress despite the export control headwinds and overall fierce global competition in AI, lots of discussion has and will continue to ensue on whether the export control policy was effective and how to assess who is ahead and behind in the US-China AI competition. Before settling this debate, however, it is important to recognize three idiosyncratic advantages that makes DeepSeek a unique beast.
These are idiosyncrasies that few, if any, leading AI labs from either the US or China or elsewhere share. Thus, understanding them is important, so we don’t over-extrapolate or under-estimate what DeepSeek’s success means in the grand scheme of things.