Chips, China, and a Lot of Money: The Factors Driving the DeepSeek AI Turmoil
Why is DeepSeek causing global technology shockwaves?
Matt Sheehan: DeepSeek is a Chinese AI startup that recently released a series of very impressive generative AI models. One of those models, DeepSeek R1, is a “reasoning model” that takes its time to think through an extended chain of logic before it gives an answer. This type of reasoning is a relatively new paradigm that was pioneered by OpenAI last year, and it is viewed by many as the most promising way forward for AI research. In terms of performance, DeepSeek’s new model is roughly on par with OpenAI’s o1 model from last September.
The “reckoning” here comes from how DeepSeek did it: quickly, cheaply, and openly. DeepSeek had finished an initial version of R1 just a couple months after OpenAI’s release, far faster than Chinese companies were able to catch up to U.S. models in previous years. Perhaps most shocking was that DeepSeek was able to generate this performance using far less computing power—a key input for training a model—than U.S. companies. That extraordinary efficiency is likely a knock-on effect of U.S. export controls on chips: Chinese companies have been forced to get very creative with their limited computing resources. And finally, DeepSeek released its model in a relatively open source way, allowing anyone with a laptop and an internet connection to download it for free. That has thrown into doubt lots of assumptions about business models for AI companies and led to the turmoil in U.S. stock markets.
Sam Winter-Levy: Just to give you a sense of DeepSeek’s efficiency, the company claims it trained its model for less than $6 million, using only about 2,000 chips. That’s an order of magnitude less money than what Meta, for example, spent on training its latest system, which used more than 16,000 chips. Now DeepSeek’s cost estimate almost certainly only captures the marginal cost: It ignores their expenditures on building the data centers, buying the chips in the first place, and hiring a large technical team. But regardless, it’s clear that DeepSeek managed to train a highly capable model more efficiently than its U.S. competitors.
Investors seem to be especially concerned about the prospects for leading chip companies, such as Nvidia. Why?
Matt Sheehan: DeepSeek has shown it takes much less compute to train a leading AI model than was believed beforehand. If you assume that demand for AI will remain constant, then these lower needs for compute would translate to less revenue than previously projected.
But it could end up having the opposite effect. If DeepSeek makes accessing these AI models much more affordable, then that could end up increasing total demand for AI services, leading to much more revenue for chip companies. The long-term impacts on compute demand remain deeply uncertain, but the valuation of companies like Nvidia has been growing at such an extraordinary rate for the past few years that a comedown isn’t too surprising.
Sam Winter-Levy: That’s right. The tech giants have been making extraordinary capital expenditures over the past couple of years. Just last week, for example, OpenAI, SoftBank, and Oracle announced a joint venture called Stargate to build at least $100 billion in computing infrastructure, and perhaps as much as $500 billion over four years. At some point they will need to generate the revenue to justify these levels of investment. DeepSeek’s efficiency and its open availability suggested that perhaps the position of the leading U.S. tech giants was less secure than the markets had thought, that Nvidia’s prospects as the provider of vast quantities of chips for the world were somewhat less rosy than anticipated, and that we might be witnessing an overbuilding of AI infrastructure.
As Matt said, in the long run lower costs could drive greater usage, which means we’ll still require vast quantities of computing power and data centers. But it’s not historically unusual for a technology revolution to be accompanied by a lot of turbulence in the stock market and by the incineration of capital, even as cost reductions unleash new waves of innovation. That explains some of what we’ve seen over the last few days with the reaction to DeepSeek.
In its last few months, the Biden administration rolled out a series of mounting export controls on AI chips, particularly directed at China. Does this mean that export controls don’t matter anymore?
Sam Winter-Levy: Almost certainly not. Although DeepSeek has shown that you can use smaller numbers of chips than expected to train an impressive model, it would still benefit from having access to more computing power. The company’s CEO has explicitly said that access to computing power is its primary obstacle. The more chips you have, the more experiments you can run, the more data you can generate, and the more widely you can deploy your most capable models. DeepSeek has shown that with this new class of reasoning model, you can achieve an impressive performance with a small amount of compute. But you can almost certainly achieve vastly more with a large amount of compute! So access to chips will remain a key driver of success in the AI race moving forward.
It’s worth emphasizing that DeepSeek acquired most of the chips it used to train its model back when selling them to China was still legal. Although the export controls were first introduced in 2022, they only began to have a real effect in October 2023, and the latest generation of Nvidia chips has only recently begun to ship to data centers. As these newer, export-controlled chips are increasingly used by U.S. companies, we could see a gap reopen in the United States’ favor. And while DeepSeek’s achievement does cast doubt on the most optimistic theory of export controls—that they could prevent China from training any highly capable frontier systems—it does nothing to undermine the more realistic theory that export controls can slow China’s attempt to build a robust AI ecosystem and roll out powerful AI systems throughout its economy and military. After all, the amount of computing power it takes to build one impressive model and the amount of computing power it takes to be the dominant AI model provider to billions of people worldwide are very different amounts. So access to cutting-edge chips remains crucial.
Of course, this all depends on the U.S. government continuing to tighten loopholes in the export control regime to prevent Chinese chip smuggling, which is one reason why the Biden administration introduced its sweeping new diffusion framework to govern the sale of chips worldwide in its last weeks in office.