As AI continues to advance at a rapid pace, two notable foreign players have emerged: DeepSeek and Qwen. These powerful AI models, developed by a Chinese lab and Alibaba, respectively, have garnered attention for their impressive capabilities and potential to disrupt the AI industry. However, alongside their technological prowess comes a host of privacy concerns that warrant closer examination. This article delves into the privacy pitfalls associated with these AI models and explores the implications for users and the broader AI ecosystem.
DeepSeek is an AI company and a set of powerful AI models developed by a Chinese lab backed by High-Flyer Capital Management, a quantitative hedge fund. DeepSeek’s models, including DeepSeek V3 and DeepSeek-R1, are designed to handle a wide range of language tasks like coding, translation, and text generation. What makes DeepSeek notable is its ability to achieve high performance on various AI benchmarks while being more cost-efficient to train and run compared to models from major U.S. tech companies. DeepSeek has gained popularity and attention in the AI community for potentially disrupting the field by offering strong AI capabilities at lower costs, challenging the dominance of American tech giants in the AI race.
Qwen is an AI model developed by Alibaba, one of China’s largest tech companies. It works by processing and analyzing vast amounts of text data to understand and generate human-like responses. Qwen is designed as a large language model, similar to ChatGPT, which means it can engage in conversations, answer questions, and perform various language-related tasks. The model has gained popularity and importance because it represents China’s growing capabilities in AI technology, competing with models from major U.S. tech companies.
Foreign-hosted AI models like DeepSeek and Qwen have come under intense scrutiny for their data collection and privacy practices. As these apps gain popularity worldwide, cybersecurity experts and government officials have raised alarming concerns about the extent of user data collected, how it’s stored and processed, and the potential risks of this information being accessed by foreign authorities. From collecting keystroke data to storing information on foreign servers, these models have sparked a range of serious privacy and security issues.
On the surface, DeepSeek and Qwen are no different than their data-hungry U.S. counterparts. Trained on data, DeepSeek’s and Qwen’s foundation models have been developed by accessing huge datasets, many of which are accessible through the internet, whether or not such data was intended to be accessed and used for this purpose. Like with any large language model (LLM), this presents acute privacy risks:
- Data recall. These models can store and then “remember” specific personal information when prompted by inputs. For example, think of a bad actor trying to gain access to a secured account by answering password recovery questions. The bad actor may be able to gain information about specific individuals (such as their mother’s maiden name or the street where they were born) by querying an LLM.
- Input analysis. User prompts are themselves a treasure trove of information that can be recorded, transferred, stored, or analyzed to build interest profiles about the users themselves. For example, tracking the questions asked by a first-time expectant parent can provide very sensitive insights into the individual’s health issues or cultural beliefs.
- Lack of privacy rights. Individuals may find significant hurdles in exercising their statutorily provided rights with regard to their data. For example, if an individual, fearing retaliation from their employer, wishes for the AI provider to delete a profile developed based upon a conversation history regarding forming a union, how does the AI provider provide that functionality? Once ingested, can an individual’s information be removed or sequestered away from further access or use? Maybe not.
These are issues present with U.S. and European-based AI models, so why have DeepSeek and Qwen caused such a stir? The short answer is the concerns about the ability of the Chinese Communist Party to access the troves of data used to train the AI models and the user prompts, as well as the analytics derived therefrom. All of this data is stored in China. From the European perspective, such international data transfer to China is illegal without a data protection framework that complies with the EU’s General Data Protection Regulation (GDPR) and its cousins in the United Kingdom and Switzerland. Without this framework, European citizens would have no ability to exercise their data rights (which are considered to be basic human rights) with DeepSeek and Qwen. Further, neither DeepSeek nor Qwen has a GDPR-required representative in the EU, and therefore, data regulators have little direct power to exercise their privacy mandates.
From the U.S. perspective, the issue is more focused on national security than human rights. The U.S. government itself has been notoriously light-fingered with its own citizens’ private information, and the current U.S. privacy framework does not outlaw international data transfers in most cases. However, the above-bulleted concerns become implicated in the context of identifying and cracking down on anti-CCP activity, gaining access to sensitive U.S. national defense information, and improperly influencing U.S. voters, especially young voters.
There are also information security concerns with the software applications that are used to access the DeepSeek and Qwen models, such as:
- Unencrypted data transmission and use of deprecated encryption methods;
- Hard-coded encryption keys, which are potentially accessible;
- Disabled built-in security protections like iOS App Transport Security;
- Possible device fingerprinting and user de-anonymization;
- Lack of transparency regarding data usage, retention, and security practices;
- Potential censorship of certain topics; and
- Concerns about malware delivery.
In light of the significant privacy concerns surrounding foreign AI models, companies should exercise extreme caution when considering their use. In fact, U.S. states[1] are already following the actions of countries like Italy[2] and South Korea[3] in banning the use of DeepSeek on government devices and networks. While these models offer impressive capabilities, the potential risks to data security and privacy far outweigh the benefits in many scenarios. Organizations should avoid using these AI services for any tasks involving sensitive, confidential, or private information. This includes customer data, financial records, proprietary business information, or any data subject to regulatory compliance. Instead, companies should prioritize AI solutions with robust privacy protections, preferably those hosted in jurisdictions with strong data protection laws. If the use of these foreign AI models is deemed necessary, it should be limited to processing only publicly available information or non-sensitive data.
[1] Governor of New York, Governor Hochul Issues Statewide Ban on DeepSeek Artificial Intelligence for Government Devices and Networks (February 10, 2025), available at https://www.governor.ny.gov/news/governor-hochul-issues-statewide-ban-deepseek-artificial-intelligence-government-devices-and.
[2] Reuters, Italy’s regulator blocks Chinese AI app DeepSeek on data protection (February 4, 2025), available at https://www.reuters.com/technology/artificial-intelligence/italys-privacy-watchdog-blocks-chinese-ai-app-deepseek-2025-01-30/.
[3] BBC, S Korea removes Deepseek from app stores over privacy concerns (February 17, 2025), available at https://www.bbc.com/news/articles/clyzym0vn8go.