Home > News > DeepSeek AI's Low-Cost Models Suspected to Use OpenAI Data, Sparking Online Irony

DeepSeek AI's Low-Cost Models Suspected to Use OpenAI Data, Sparking Online Irony

by Logan Apr 27,2025

The emergence of DeepSeek AI, a Chinese-developed model, has sparked significant controversy and concern within the U.S. tech industry. The suspicion that DeepSeek may have utilized OpenAI's data to train its own models has led to a sharp reaction from industry leaders and political figures alike. Donald Trump has labeled DeepSeek as a "wake-up call" for the U.S. tech sector, especially after Nvidia experienced a staggering $600 billion drop in market value following a 16.86% plummet in its stock price—the largest single-day loss in Wall Street history. Other tech giants like Microsoft, Meta Platforms, Google's parent company Alphabet, and Dell Technologies also saw their stock values decline, reflecting broader market unease about the competitive threat posed by DeepSeek.

DeepSeek's R1 model, built on the open-source DeepSeek-V3, claims to offer a cost-effective alternative to Western AI models like ChatGPT, reportedly requiring significantly less computing power and having been trained for just $6 million. This claim has not only challenged the hefty investments American tech companies are making in AI but has also driven DeepSeek to the top of the U.S. free app download charts, fueled by discussions about its effectiveness.

In response to these developments, OpenAI and Microsoft are investigating whether DeepSeek used OpenAI's API to integrate OpenAI's models into its own, a practice known as distillation. This technique, which involves training AI models by extracting data from more advanced ones, is explicitly prohibited by OpenAI's terms of service. OpenAI has emphasized its commitment to protecting its intellectual property and is collaborating with the U.S. government to safeguard its technology from such practices.

The situation has drawn sharp criticism and accusations of hypocrisy from some quarters. Tech PR and writer Ed Zitron highlighted the irony of OpenAI's complaints, given its own history of using copyrighted internet content to train ChatGPT. OpenAI has previously argued that training AI models without copyrighted material is "impossible," a stance that has fueled ongoing debates about the ethics and legality of AI training data.

The controversy surrounding AI training data has escalated with legal actions against OpenAI and Microsoft. The New York Times filed a lawsuit in December 2023, alleging "unlawful use" of its content to develop AI products. Similarly, a group of 17 authors, including George R. R. Martin, initiated legal action in September 2023, accusing OpenAI of "systematic theft on a mass scale." These lawsuits underscore the contentious issue of using copyrighted materials in AI development, with OpenAI defending its practices as "fair use."

Amid these legal battles, a U.S. Copyright Office ruling upheld by District Judge Beryl Howell in August 2023 stated that AI-generated art cannot be copyrighted, emphasizing the necessity of human creativity in copyright protection. This ruling adds another layer of complexity to the ongoing discussions about AI, intellectual property, and the future of technology development.