The “Renaissance Technologies in China” is doing a great job…… in AI

Insiders suggest that High-Flyer Quant plans to spin off DeepSeek into an independent startup

Jun 19, 2024

DeepSeek, a division of magic Square Quantization, released the second-generation MoE model - DataYuan - Medium

In the competitive landscape of China’s AI model industry, a new player has emerged to challenge the dominance of established tech giants like ByteDance, Alibaba, Baidu, and Zhipu AI. This unexpected contender is DeepSeek, the AI division of High-Flyer Quant, a secretive quantitative hedge fund, or as a lot of observers has put it,“the Renaissance Technologies in China”.

What’s going on here

The latest news is the release of DeepSeek Coder V2, an open-source mixture of experts (MoE) code LLM. It is said to excel at both coding and math tasks, noticeably, the company showed its performance score that beat GPT-4 Turbo, Claude 3 Opus and Gemini 1.5 Pro.

Coder V-2 is built on DeepSeek-V2, which is the newest model from Deepseek. It is not just a budget-friendly alternative, it is really a good model.

Featuring a unique Transformer architecture, DeepSeek-V2 integrates an efficient Multi-head Latent Attention (MLA) mechanism and a high-performance Mixture-of-Experts (DeepSeekMoE) architecture. This design significantly enhances efficiency and performance, allowing the model to deliver robust results while maintaining a lower memory footprint.

Boasting 236 billion parameters and trained on 8.1 trillion tokens, DeepSeek-V2 supports a 128K context window, rivaling top-tier models like GPT-4 Turbo and LLaMA3-70B. Not to mention that DeepSeek-V2 is fully open-sourced and commercially available.

Yet, its API costs are strikingly low—1 yuan per million tokens for input and 2 yuan per million tokens for output, making it an attractive option for developers.

There is also an AI price war happening between Chinese AI companies. And it all started from DeepSeek. The low-key company slashed its API rates to a mere 1% of GPT-4 Turbo’s price. This bold move not only undercut the competition but also forced industry leaders to follow suit, igniting a price revolution.

DeepSeek has garnered respect within the AI research community for its advanced mathematical and reasoning capabilities. The endorsement by notable figures like Andrej Karpathy, formerly of OpenAI, underscores its growing influence.

Why it matters

The introduction of DeepSeek-V2 has solidified DeepSeek’s reputation and positioned it as a formidable contender in the AI market.Insiders suggest that High-Flyer Quant plans to spin off DeepSeek into an independent startup, leveraging the momentum generated by the ongoing price war. Unlike temporary subsidies, DeepSeek’s pricing reflects the actual cost of large-scale service operations, maintaining a profit margin exceeding 50%.

However, as more well-funded competitors enter the market, DeepSeek faces increasing pressure. High-Flyer Quant’s core business in quantitative finance is also adapting, potentially shifting from internal funding to venture capital to support its AI endeavors.

High-Flyer Quant has a history of leveraging advanced mathematics and computer science for financial investment, dating back to its exploration of automated trading in 2008. By 2016, it became the first to implement deep learning models in live trading. From many aspects, High-Flyer Quant is the Chinese version of Renaissance Technologies.

Significant investments, such as the 1 billion yuan allocated to the "Fire-Flyer II" AI cluster, have laid the groundwork for DeepSeek’s development. With over 10,000 A100 GPUs, High-Flyer has one of the highest-performance computing resources in China. The company’s founder, Wenfeng Liang, has a profound interest in generative AI, driving the firm's focus on AI infrastructure and research, akin to the culture at Google.

On May 15th, DeepSeek Chat completed its registration and is now open to public.