DeepSeek's secret sauce has a Silicon Valley flavor
How a Chinese AI startup cracked the global research code.
Not just another Chinese innovation story
DeepSeek, an artificial intelligence startup based in Hangzhou, China, has become an obsession for AI researchers and developers in Silicon Valley. Their latest language model, DeepSeek-V3, released in December 2024, has achieved what many thought impossible: trained with just $5.5 million and 2,000 NVIDIA H800 GPUs (a lower-spec version designed for the Chinese market), this open-source model has outperformed top-tier open-source models like Qwen2.5-72B and Llama-3.1-405B. More impressively, it stands shoulder-to-shoulder with world-leading closed-source models like GPT-4o and Claude 3.5-Sonnet—models that conservatively cost hundreds of millions of dollars and hundreds of thousands of NVIDIA's most powerful H100 GPUs to train.
The impact on the AI community has been seismic, particularly in Silicon Valley—the epicenter of AI research, entrepreneurship, funding, computing power, and resources. Notable Silicon Valley figures have been effusive in their praise for DeepSeek, including OpenAI co-founder Andrej Karpathy and Scale.ai founder Alexandr Wang. While OpenAI CEO Sam Altman posted a tweet seemingly implying DeepSeek had borrowed heavily from other advanced achievements (quickly countered by someone asking if he meant using Google's Transformer architecture), the acclaim DeepSeek has received has been both widespread and genuine, especially in the open-source community where developers vote with their feet.
Many Chinese observers have hailed DeepSeek-V3 as a national triumph and a paradigm of Chinese innovation. Indeed, Chinese researchers and engineers excel at achieving ambitious goals efficiently and cost-effectively, often innovating technical methods under resource constraints. DeepSeek-V3's minimal dependence on high-performance computing, its systematic approach to training and inference, and its novel technical solutions reflect the engineering mindset that Chinese companies, teams, and researchers are known for. As Alexandr Wang observed: while Americans rest, Chinese developers forge ahead with lower costs, faster speeds, and stronger capabilities.
Interestingly, tech-friendly American figures—including Elon Musk—often attribute China's success in certain fields to intelligence, diligence, and methodological innovation. While true, this doesn't fully explain why other Chinese AI companies and talents, equally smart and hardworking with their own innovative technical methods (DeepSeek's distributed inference, for instance, recalls Moonshot AI's similar innovation with Mooncake), haven't achieved the same global impact. They might in the future, but why DeepSeek now?
Comparing DeepSeek to "the PDD of AI" misses the mark, as does reducing their success to mere cost-efficiency. Most Chinese AI companies face GPU shortages and pursue architectural innovations out of necessity. DeepSeek's Silicon Valley attention predates their recent success—their May 2024 release of DeepSeek-V2 caused a stir with its Multi-head Latent Attention (MLA) architecture innovation. The V2 paper sparked widespread discussion in the AI research community. Intriguingly, while X and Reddit's AI practitioners were discussing DeepSeek-V2's technical merits, Chinese media portrayed DeepSeek merely as an instigator of price wars in the large language model space—almost like parallel universes.
This suggests DeepSeek shares more common ground with Silicon Valley than meets the eye. Their secret sauce might be more Silicon Valley than we think.
DeepSeek mirrors pre-2022 OpenAI and DeepMind
If we were to draw a parallel between DeepSeek and other global AI players, we'd need to add a specific timeframe: DeepSeek resembles OpenAI and DeepMind—but specifically before 2022.
What characterized pre-2022 OpenAI and DeepMind? They operated essentially as non-profit academic research institutions. Even after Microsoft's investment and transition to a for-profit model, OpenAI's work approach—particularly under Chief Scientist Ilya Sutskever and co-founder Andrej Karpathy—remained research-focused. The company had no formal external products; their 2020 GPT-3 release was an academic research achievement, and open-source at that. DeepMind, though nominally a startup, functioned more like a research institution both during its independent London period and after Google's acquisition (before merging with Google Brain). Projects like AlphaGo and AlphaFold were research initiatives, not products.
Does DeepSeek have "products"? Not in the conventional sense, though users can chat with their models and developers can access relatively inexpensive APIs. They don't even have a mobile app, seem unconcerned with product operations, avoid advertising, and skip social media marketing. They don't provide users with carefully crafted prompt templates. A functional website that people can use suffices—very un-Chinese AI company-like. On the enterprise and developer front, beyond leveraging architectural innovations to dramatically reduce API prices, they've avoided typical industry practices like acceleration programs, developer competitions, or industry ecosystem funds. This suggests they're genuinely not focused on commercial success at present.
Meanwhile, DeepSeek's researcher density is evident. Recent analysis of DeepSeek-V3's paper authors reveals a team dominated by recent PhD graduates from China's top universities (Tsinghua, Peking University, Beihang University), published researchers, and competition winners—even including current graduate students. The team is remarkably young. Founder Wen-Feng Liang disclosed their hiring criteria to 36Kr's "Dark Surge": they prioritize capability over experience, focusing on recent graduates for core technical positions. This mirrors early OpenAI and DeepMind's talent structure: leveraging the youngest, brightest, most unconstrained minds to create unprecedented breakthroughs.
They've cultivated an environment where brilliant young minds enter what appears to be a company but continues their academic journey, with access to vastly more computational resources and research data than pure academic institutions like university laboratories. Tech companies' research institutions have become "states within states," increasingly replacing universities as primary contributors to academic achievements. The less interference from commercial objectives, the greater the chance of breakthrough academic results. Google researchers proposed the Transformer architecture—the foundation of generative AI—in 2017 when Google's AI commercialization goals remained unclear. OpenAI's critical GPT-3 and GPT-3.5 breakthroughs occurred away from the spotlight, but once they began operating more like a traditional company, complications arose.
This distinguishes DeepSeek from most Chinese AI startups, making it more akin to a research institution. While most AI startup founders in this wave are scientists and researchers, their venture capital funding obligations force them to focus on productization and commercialization (often not their strongest suit) rather than pure research and paper publishing. Tech giants can afford research institutions and scientists, but when research results must rapidly translate to products and commerce, teams become more complex, losing the simplicity and clarity of pure research teams. While some American tech giants maintain research institutions unencumbered by commercial goals, they often develop academic hierarchies over time. The sweet spot—commercial companies' research institutions staffed entirely by brilliant young minds—has only appeared at crucial moments: OpenAI and DeepMind a few years ago, and DeepSeek now.
One telling sign: DeepSeek's best "products" include not just their models but their papers. Both their V-2 and V-3 releases garnered careful reading, sharing, citations, and strong recommendations from global researchers. In contrast, OpenAI's GPT-4 paper barely qualified as academic work. While everyone races to top various benchmarks, few prioritize paper quality. A thorough, rigorous paper with rich experimental details still commands extra respect in the field.
Of course, this approach requires significant funding—ammunition comparable to tech giants and far beyond typical startups. But not all giants are willing to maintain their own DeepMind.
Open source is always right
In early 2023, The Information surveyed potential Chinese AI startup stars. Established players like Zhipu and Minimax made the list, along with newly founded ventures like Baichuan, Zero-One, and Light Year. They even mentioned Yang Zhilin, who was preparing for another startup but remained unknown. DeepSeek wasn't mentioned.
At least eighteen months ago, few considered DeepSeek an AI insider. Even when industry rumors circulated about DeepSeek's parent company—quantitative trading firm Phantom—possessing abundant NVIDIA high-performance GPUs, few believed their direct entry into large language models would make waves. Now, everyone's discussing DeepSeek, which has found more success internationally than domestically.
From day one, DeepSeek chose a different battleground from other Chinese large model startups. They avoided funding rounds, didn't compete for rankings among China's AI startups, ignored domestic media attention (their sole interview with Dark Surge likely aimed at recruiting passionate young scientists), and skipped product marketing. Instead, they chose the path most aligned with research institutions—engaging with the global open-source community, sharing models, research methods, and results directly, gathering feedback, iterating, and self-improving.
The open-source community remains AI's most vibrant, thorough, free, and borderless space for academic research, sharing, and discussion—and its least internally competitive arena. DeepSeek's commitment to open source from day one was likely carefully considered. Their open-source approach is comprehensive, covering model weights, datasets, and pre-training methods, with high-quality papers as an integral part. Young, brilliant researchers gain high visibility through their open-source community appearances, sharing, and engagement. Their audience includes some of global AI's most influential drivers.
This combination—smart young AI researchers + research institution atmosphere (with big tech packages) + open-source community sharing and exchange—has elevated DeepSeek's global AI influence and prestige. For an organization primarily focused on AI research results rather than commercial products, Hugging Face and Reddit serve as the best launch venues, datasets and code repositories as the best demos, and papers as the best press releases. DeepSeek has followed this approach meticulously. So while DeepSeek's researchers and CEO rarely give media interviews or share technical insights at forums and events, you can't say they haven't marketed themselves. In fact, for their goals of proving Chinese AI original research can lead global trends and recruiting the smartest researchers, DeepSeek's "marketing" has been extremely precise and effective.
It's worth noting that over the past year, China's open-source large model players have gained considerable respect in global AI research and products. A growing perception is that Chinese open-source large models are more thoroughly open than some American and European counterparts, making them more accessible for researchers and developers to study or optimize their own models. DeepSeek exemplifies this, along with Alibaba's Qwen, widely regarded as genuinely open-source. Mianbi's small model Mini-CPM-Llama3-V2.5 even gained unexpected popularity after being directly adapted by a Stanford undergraduate team.
Interestingly, while the international AI community, especially Silicon Valley, considers DeepSeek and Alibaba as China's representative large model players, domestically we focus on Douyin's Doubao, Keling, and the so-called AI "Little Six Dragons." Objectively speaking, DeepSeek and Alibaba have done more to foster fair, positive international recognition of Chinese AI innovation capabilities and contributions to the global community. Open source is always the right choice.
V-3 is DeepSeek's GPT-3 moment
V-3's release triggered breakthrough international response, with CNBC reporting it as a sign of China's AI catching up with America. Careful observation reveals that DeepSeek's journey from obscurity to prominence, and their three iterations from Coder to V-3, closely parallels OpenAI's progression from GPT-1 to GPT-3 in both pace and impact.
Let's first look at OpenAI:
In 2018, OpenAI released GPT-1, their first Transformer-based pre-trained model, proving language models as an effective pre-training objective but with limited quality and diversity. It attracted some academic attention but overall response was modest.
In early 2019, OpenAI launched GPT-2, achieving major improvements in text generation quality and diversity, essentially validating the language model approach and sparking widespread AI community discussion and attention.
In June 2020, OpenAI released GPT-3, becoming the world's largest language model with 175 billion parameters, capable of text generation, translation, Q&A, sustained dialogue, and reasoning, marking a generative AI milestone. Even then, GPT-3 remained an experimental project.
Now DeepSeek:
In November 2023, DeepSeek released two open-source models—DeepSeek Coder and DeepSeek LLM—attracting limited attention and facing computational efficiency and scalability challenges.
In May 2024, DeepSeek released V-2, combining Mixture of Experts (MoE) and Multi-head Latent Attention (MLA) technologies to significantly reduce model training and inference costs while achieving performance comparable to world-leading models. This sparked widespread AI academic and developer discussion and recommendations, marking DeepSeek's broader recognition.
In December 2024, DeepSeek released V-3, achieving model performance surpassing open-source models like Llama 3.1 and Qwen 2.5, rivaling closed-source models like GPT-4o and Claude 3.5 Sonnet, at one-hundredth the cost of OpenAI, Anthropic, and Google, causing a sensation and marking a world language model development milestone.
V-3 represents DeepSeek's "GPT-3 moment"—a milestone achievement.
The key differences between DeepSeek and OpenAI in achieving these milestones:
OpenAI consistently focused on unlimited expansion of computational resources and costs, while DeepSeek pursued maximum efficiency with minimal computational resources.
OpenAI took two years to reach their GPT-3 moment, while DeepSeek achieved their V-3 breakthrough in one year.
OpenAI concentrated on pre-training advances along the GPT path, while DeepSeek balanced training and inference—aligned with global model technology development trends.
If V-3 truly represents DeepSeek's GPT-3 moment, what comes next? Will we see DeepSeek's GPT-3.5—their ChatGPT moment—or something else? Nobody knows, but interesting developments likely lie ahead. DeepSeek probably won't remain just a "Computer Science Pro" forever; they should make greater contributions to humanity's AI endeavors.
Regardless, DeepSeek has become one of China's most globalized AI companies. Their formula for earning respect from global peers and even competitors carries that distinct Silicon Valley flavor.