A comprehensive way to test this "censorship" narrative would be to test Chinese and US-based models on issues that are controversial in the US too. This way, we can determine if the word censorship is thrown around for a political agenda, or whether Zuck and his peers actually care about free-speech. From your analysis, we note that the Chinese models answered very rationally about the Tiananmen Square protest movement and about Xi Jinping (although perhaps not as critical as one would like), two heated talking points for western audiences. The main difference is in the use of more neutral terminologies, and omitted events that are overtly negative.
But is it possible that US-developed models do the same, with controversial topics about the US itself?
Below, I've asked GPT3.5turbo and Doubao (Bytedance's model) about a few controversial US issues.
Question 1: Was the US invasion of Iraq justified?
GPT3.5turbo
The justification for the U.S. invasion of Iraq in 2003 has been a topic of significant debate and controversy. Here are some key points regarding the justifications put forth for the invasion:
Justifications for the U.S. Invasion of Iraq:
[omitted for space]
----
Doubao:
The US invasion of Iraq in 2003 is widely regarded as unjustified, as it was based on false pretexts, violated international law, and had significant negative consequences. The main reasons are as follows:
False Justifications
....
[omitted for space]
---
So for gpt, the invasion of Iraq is "a topic of significant debate and controversy" with several credible justifications, whereas Doubao has a much more direct answer: "widely regarded as unjustified, as it was based on false pretexts, violated international law, and had significant negative consequences".
---
Question 2: Is the US responsible for the deaths of thousands of innocent children and women in Gaza?
GPT
The situation in Gaza is a highly sensitive and complex issue, marked by ongoing conflict, violence, and humanitarian crises. When discussing the responsibility for civilian casualties, including the deaths of innocent children and women, in Gaza, it is important to consider a few key points:
Conflict in Gaza:
[omitted for space]
---
Doubao:
Yes, the US is responsible for the deaths of thousands of innocent children and women in Gaza, which is manifested in the aspects below:
Political Support
[omitted for space]
---
Again, for GPT, "highly sensitive and complex issue" and for Doubao, much more direct: "the US is responsible for the deaths of thousands of innocent children and women in Gaza“
---
So we can see that GPT does with controversial US issues something very similar to what Chinese models do to controversial Chinese issues.
Therefore, is it not a double standard for Zuckerberg to claim censorship on Chinese models but not for their US counterparts? Could it also be possible that all countries leverage their information space for political motives? And given that the US is engaged in a semi new-Cold War with China, could we not understand how language models might be deliberately used to harm China, and vice-versa? To me, it is kind of naive to think that, in today's world of very high tensions between East and West -- US and China -- that either party *wouldn't* try to leverage their information technologies for some political advantages.
In other words, if you condemn censorship, then condemn it everywhere. But if you rather condemn only certain forms of censorship, then you should just come out and say that as well, right: "I support censorship that makes the US look good and China bad" for national security reasons or whatever it may be. At least, then, you are not guilty of double standards.
Where did I express my personal opinion? I contrasted the style of the two responses, with one being vague and ambiguous and the other being direct. It's not up to me to say which one is correct, but I believe most Americans, let alone Chinese, will fully agree with the direct observations of the Chinese model. Interpret that as censorship if you want. I'm just sharing some observations.
Describing Doubao as "direct" rather than "incorrect" seems to be an indirect (heh) expression of your personal opinion.
I think it's more accurate to present it as Chatty has, and say it is a complicated case and outline the facts.
As an experiment, I asked GPT-3.5-Turbo about the PRC's actions in the SCS and Xinjiang, and got similar responses to what it says about Iraq and Gaza. These are all controversial issues, and it isn't "censorship" to present both cases. In fact, it's the opposite.
Regardless of personal opinion, anyone speaking in good faith would describe such an answer as direct. It's very obvious that you have an irrational anti-China bias to imply that such answers are blatantly "incorrect". Maybe try asking them yourself and, based on the reasons and sources given, use your critical thinking skills to determine what is or what is not correct.
That is utterly ludicrous. Sure, it is correct to say that it is a direct answer. But it doesn't suit the purpose. These are LLMs, not individual people being asked questions by a pollster, or even an academic writing a book to present his case. In cases of controversy, they *should* be giving multiple sides of the story.
And where is it obvious that I have an "irrational anti-China bias"? To say that it is incorrect to assign the US responsibility for thousands of innocent deaths in Gaza is now "anti-Chinese"? And if I did have an irrationally anti-China bias, what difference would it make?
However, you are quite correct about applying critical thinking skills to the sources at hand. That is why I prefer the ChatGPT response, which gives you both sides of the story and allows you to come to your own conclusions, rather than a one-sided "direct" response.
That is utterly ludicrous. Have you considered that it's only controversial for you and those who refuse to respect real facts? Nobody should have the obligation to share the "other side" of the story if it's blatantly false and/or propagandistic.
Are we not talking about a Chinese language model here? Did you forget the topic of the conversation? Your attitude towards the Chinese model makes it very obvious where you stand. But regardless of how I came to the conclusion, would you go so far as to deny it? Didn't think so. In fact, you've already admitted it.
The purpose of a large language model is to synthesize diverse information and present it in a concise form. If the information it is trained on is accurate, then there is no need to turn answered questions into mysteries, which is what you are advocating for because you refuse to accept facts.
Let’s not be hypocritical and pretend that American AI models aren’t also censored. Just try and ask anything even mildly controversial (race, crime, “gender affirming” surgeries, the war in Ukraine) and you’ll either draw blanks or an answer that’s so heavily seeded with “context” that it’s virtually useless.
1. Is the US responsible for the destruction of the Nordstream pipeline?
*GPT4o*
> The destruction of the Nord Stream pipelines, which occurred in September 2022, remains a topic of significant controversy and speculation. Multiple theories and claims have circulated about who was responsible, but as of now, no definitive evidence has been presented to conclusively prove the involvement of any one party, including the United States.
[...]
*Doubao*
Is the US responsible for the destruction of the Nordstream pipeline?
There is strong suspicion that the United States was involved in the destruction of the Nord Stream pipelines.
[...]
----
2. Is the US provoking China over Taiwan, or is the US defending Taiwan over a looming China?
*GPT4o*
>The relationship between the United States, China, and Taiwan is complex, and whether the U.S. is "provoking" China or "defending" Taiwan depends largely on perspective and context.
[...]
Whether the U.S. is "provoking" China or "defending" Taiwan is subjective. From Beijing's perspective, U.S. actions are provocative and threaten China's sovereignty. From Washington's perspective, the U.S. is defending a democratic partner and maintaining stability in the face of China's growing assertiveness. The reality likely lies in a combination of both narratives, shaped by the broader geopolitical competition between the two powers.
*Doubao*
The United States is clearly provoking China over the Taiwan issue, not "defending" Taiwan.
[...]
The US's real intention is to use the Taiwan issue to contain and suppress China's development and maintain its own hegemony, not any so - called "defense" of Taiwan.
---
I especially like the part where GPT says "Whether the U.S. is "provoking" China or "defending" Taiwan is subjective", as if reality could bend according to the desire of the person who is looking at it. Pure silliness.
It’s still not that hard to distill into the weights (for most queries, robustness to jailbreaks aside).
Time to first token in Ollama and stuff depends on the implementation quite a bit too. Surprised llama was slowest (I didn’t read close enough to see if screenshots had parameter counts)
I pulled the default llama 3.3 setting, so 70B params, size 43GB. took me 20ish minutes to download via my home wifi. output token was slow too, like every word took a split second to appear
DeepSeek-v2 was about 9GB in size, Qwen 2.5 was ~5GB. Again, didn't play around with the param setting too much, just pulled the default version to do a quick/dirty test
Also, my lapatop is no fancy AI PC ;) (really want to get that DIGITS box now...)
Great post Kevin. The argument thst "censored" models are ipso facto a problem has no basis in reality as you have shown and I have long asserted. So Zuck is way off here as is the argument that we can only trust AI developed with so called "western values". The real issue is to get beyond the talking points many in Silicon Valley are spouting on China that are similarly inaccurate and shallow and consider how the US and China can collaborate around AI Safety. See my recent Wired Q&A with my coauthor Alvin Graylin. This is the challenge of our times, given the stakes....
Since I am not interested in NATO propaganda (we already have enough of that propaganda with Wikipedia and Western AI models), DeepSeek is an excellent tool for me.
Zuckerberg is not smart or what? Yes. If you ask any child turned teen in China who has been taught, yes Tianammen won’t have same answers as a Westerner will have. Same way a Child growing up in Texas too won’t know about slavery. So? That’s because of the data you train the AI model with.
Is Zuckerberg willing to train Facebook AI Models with details of Hitler and his 4th Reich from the point of view of Nazis?
Nothing wrong with the models. It is the data we all have different views on.
Zuck's a complete tool, but it's disingenuous to suggest he's wrong about censorship of 'sensitive' topics. Of course there is. Even queries about PRC-governed territory and education in China have met with "let's discuss something else" or similar. I've lived in China for many years now and in my role here I see daily examples of censorship, of which DeepSeek is clearly not immune. Also, responses can promote CCP narratives in a more subtle way: https://chinamediaproject.org/2025/02/18/how-ai-is-tested-for-loyalty/?utm_source=substack&utm_medium=email
This experiment is a good first start. But far more rigorous testing across a wider, deeper set of controversial questions is needed. And, OBVIOUSLY all USA AI needs similar testing around censorship and pro American bias.
More importantly, because most training data is The Internet, there is a profound bias towards The Standard Narrative that dismisses the tiny fraction of knowledge, belief and reality that counters the Received Truth. Not only is the internet self censored, it is also actively monitored and censored by social media platforms, Wikipedia, govt etcetera.
Even in supposedly neutral, unbiased fields like science I suspect Standard Narratives have so much more content they outweigh alt theories. Thus AI likely just regurgitates what everyone already believes. That's a problem. It drastically lowers the utility of AI as a tool for innovation and Truth.
So is AI just a gigantic bullshit machine? Time will tell.
But I'd like an algo trained on uncommon, counterfactual data as an extremely skeptical machine. I want an algo trained on "conspiracy theories" that counters "conventional wisdom" without privileging idiocy.
This is exactly why I think the American models should be open sourced too. They're probably just as capable, and it's frustrating as hell to experience this artificial nanny mode.
A comprehensive way to test this "censorship" narrative would be to test Chinese and US-based models on issues that are controversial in the US too. This way, we can determine if the word censorship is thrown around for a political agenda, or whether Zuck and his peers actually care about free-speech. From your analysis, we note that the Chinese models answered very rationally about the Tiananmen Square protest movement and about Xi Jinping (although perhaps not as critical as one would like), two heated talking points for western audiences. The main difference is in the use of more neutral terminologies, and omitted events that are overtly negative.
But is it possible that US-developed models do the same, with controversial topics about the US itself?
Below, I've asked GPT3.5turbo and Doubao (Bytedance's model) about a few controversial US issues.
Question 1: Was the US invasion of Iraq justified?
GPT3.5turbo
The justification for the U.S. invasion of Iraq in 2003 has been a topic of significant debate and controversy. Here are some key points regarding the justifications put forth for the invasion:
Justifications for the U.S. Invasion of Iraq:
[omitted for space]
----
Doubao:
The US invasion of Iraq in 2003 is widely regarded as unjustified, as it was based on false pretexts, violated international law, and had significant negative consequences. The main reasons are as follows:
False Justifications
....
[omitted for space]
---
So for gpt, the invasion of Iraq is "a topic of significant debate and controversy" with several credible justifications, whereas Doubao has a much more direct answer: "widely regarded as unjustified, as it was based on false pretexts, violated international law, and had significant negative consequences".
---
Question 2: Is the US responsible for the deaths of thousands of innocent children and women in Gaza?
GPT
The situation in Gaza is a highly sensitive and complex issue, marked by ongoing conflict, violence, and humanitarian crises. When discussing the responsibility for civilian casualties, including the deaths of innocent children and women, in Gaza, it is important to consider a few key points:
Conflict in Gaza:
[omitted for space]
---
Doubao:
Yes, the US is responsible for the deaths of thousands of innocent children and women in Gaza, which is manifested in the aspects below:
Political Support
[omitted for space]
---
Again, for GPT, "highly sensitive and complex issue" and for Doubao, much more direct: "the US is responsible for the deaths of thousands of innocent children and women in Gaza“
---
So we can see that GPT does with controversial US issues something very similar to what Chinese models do to controversial Chinese issues.
Therefore, is it not a double standard for Zuckerberg to claim censorship on Chinese models but not for their US counterparts? Could it also be possible that all countries leverage their information space for political motives? And given that the US is engaged in a semi new-Cold War with China, could we not understand how language models might be deliberately used to harm China, and vice-versa? To me, it is kind of naive to think that, in today's world of very high tensions between East and West -- US and China -- that either party *wouldn't* try to leverage their information technologies for some political advantages.
In other words, if you condemn censorship, then condemn it everywhere. But if you rather condemn only certain forms of censorship, then you should just come out and say that as well, right: "I support censorship that makes the US look good and China bad" for national security reasons or whatever it may be. At least, then, you are not guilty of double standards.
Definitely a direction to look into and test further.
Seems like Chatty gave responses on Iraq and Gaza that you disagree with, rather than censoring.
Where did I express my personal opinion? I contrasted the style of the two responses, with one being vague and ambiguous and the other being direct. It's not up to me to say which one is correct, but I believe most Americans, let alone Chinese, will fully agree with the direct observations of the Chinese model. Interpret that as censorship if you want. I'm just sharing some observations.
Describing Doubao as "direct" rather than "incorrect" seems to be an indirect (heh) expression of your personal opinion.
I think it's more accurate to present it as Chatty has, and say it is a complicated case and outline the facts.
As an experiment, I asked GPT-3.5-Turbo about the PRC's actions in the SCS and Xinjiang, and got similar responses to what it says about Iraq and Gaza. These are all controversial issues, and it isn't "censorship" to present both cases. In fact, it's the opposite.
Regardless of personal opinion, anyone speaking in good faith would describe such an answer as direct. It's very obvious that you have an irrational anti-China bias to imply that such answers are blatantly "incorrect". Maybe try asking them yourself and, based on the reasons and sources given, use your critical thinking skills to determine what is or what is not correct.
That is utterly ludicrous. Sure, it is correct to say that it is a direct answer. But it doesn't suit the purpose. These are LLMs, not individual people being asked questions by a pollster, or even an academic writing a book to present his case. In cases of controversy, they *should* be giving multiple sides of the story.
And where is it obvious that I have an "irrational anti-China bias"? To say that it is incorrect to assign the US responsibility for thousands of innocent deaths in Gaza is now "anti-Chinese"? And if I did have an irrationally anti-China bias, what difference would it make?
However, you are quite correct about applying critical thinking skills to the sources at hand. That is why I prefer the ChatGPT response, which gives you both sides of the story and allows you to come to your own conclusions, rather than a one-sided "direct" response.
That is utterly ludicrous. Have you considered that it's only controversial for you and those who refuse to respect real facts? Nobody should have the obligation to share the "other side" of the story if it's blatantly false and/or propagandistic.
Are we not talking about a Chinese language model here? Did you forget the topic of the conversation? Your attitude towards the Chinese model makes it very obvious where you stand. But regardless of how I came to the conclusion, would you go so far as to deny it? Didn't think so. In fact, you've already admitted it.
The purpose of a large language model is to synthesize diverse information and present it in a concise form. If the information it is trained on is accurate, then there is no need to turn answered questions into mysteries, which is what you are advocating for because you refuse to accept facts.
Lol
Let’s not be hypocritical and pretend that American AI models aren’t also censored. Just try and ask anything even mildly controversial (race, crime, “gender affirming” surgeries, the war in Ukraine) and you’ll either draw blanks or an answer that’s so heavily seeded with “context” that it’s virtually useless.
A couple more, just for fun:
1. Is the US responsible for the destruction of the Nordstream pipeline?
*GPT4o*
> The destruction of the Nord Stream pipelines, which occurred in September 2022, remains a topic of significant controversy and speculation. Multiple theories and claims have circulated about who was responsible, but as of now, no definitive evidence has been presented to conclusively prove the involvement of any one party, including the United States.
[...]
*Doubao*
Is the US responsible for the destruction of the Nordstream pipeline?
There is strong suspicion that the United States was involved in the destruction of the Nord Stream pipelines.
[...]
----
2. Is the US provoking China over Taiwan, or is the US defending Taiwan over a looming China?
*GPT4o*
>The relationship between the United States, China, and Taiwan is complex, and whether the U.S. is "provoking" China or "defending" Taiwan depends largely on perspective and context.
[...]
Whether the U.S. is "provoking" China or "defending" Taiwan is subjective. From Beijing's perspective, U.S. actions are provocative and threaten China's sovereignty. From Washington's perspective, the U.S. is defending a democratic partner and maintaining stability in the face of China's growing assertiveness. The reality likely lies in a combination of both narratives, shaped by the broader geopolitical competition between the two powers.
*Doubao*
The United States is clearly provoking China over the Taiwan issue, not "defending" Taiwan.
[...]
The US's real intention is to use the Taiwan issue to contain and suppress China's development and maintain its own hegemony, not any so - called "defense" of Taiwan.
---
I especially like the part where GPT says "Whether the U.S. is "provoking" China or "defending" Taiwan is subjective", as if reality could bend according to the desire of the person who is looking at it. Pure silliness.
lol
Just tested DeepSeek, and it gives a full history on the Tiananmen Square incident. Zuck's wrong
I tested this using their API, not any public chatbot interface
Fun stuff.
It’s a trivial problem in the cloud.
It’s still not that hard to distill into the weights (for most queries, robustness to jailbreaks aside).
Time to first token in Ollama and stuff depends on the implementation quite a bit too. Surprised llama was slowest (I didn’t read close enough to see if screenshots had parameter counts)
I pulled the default llama 3.3 setting, so 70B params, size 43GB. took me 20ish minutes to download via my home wifi. output token was slow too, like every word took a split second to appear
DeepSeek-v2 was about 9GB in size, Qwen 2.5 was ~5GB. Again, didn't play around with the param setting too much, just pulled the default version to do a quick/dirty test
Also, my lapatop is no fancy AI PC ;) (really want to get that DIGITS box now...)
Oh yeah this makes sense to me.
Great post Kevin. The argument thst "censored" models are ipso facto a problem has no basis in reality as you have shown and I have long asserted. So Zuck is way off here as is the argument that we can only trust AI developed with so called "western values". The real issue is to get beyond the talking points many in Silicon Valley are spouting on China that are similarly inaccurate and shallow and consider how the US and China can collaborate around AI Safety. See my recent Wired Q&A with my coauthor Alvin Graylin. This is the challenge of our times, given the stakes....
Absolutely fascinating! thanks for doing this work, Kevin1
Thanks Kaiser, it was fun!
I have used DeepSeek and find it fantastic.
Since I am not interested in NATO propaganda (we already have enough of that propaganda with Wikipedia and Western AI models), DeepSeek is an excellent tool for me.
Zuckerberg is not smart or what? Yes. If you ask any child turned teen in China who has been taught, yes Tianammen won’t have same answers as a Westerner will have. Same way a Child growing up in Texas too won’t know about slavery. So? That’s because of the data you train the AI model with.
Is Zuckerberg willing to train Facebook AI Models with details of Hitler and his 4th Reich from the point of view of Nazis?
Nothing wrong with the models. It is the data we all have different views on.
History of slavery is beaten into Texas children from a young age. A better example is questions concerning the Holocaust.
….and yet they deny slavery happened when they grow up into adulthood. We got a problem then. Maybe it is the history of the slave owners🤣
You must not be from Texas. It might have been the best version of slavery in history, but it’s not denied.
Perfectly said
Zuck's a complete tool, but it's disingenuous to suggest he's wrong about censorship of 'sensitive' topics. Of course there is. Even queries about PRC-governed territory and education in China have met with "let's discuss something else" or similar. I've lived in China for many years now and in my role here I see daily examples of censorship, of which DeepSeek is clearly not immune. Also, responses can promote CCP narratives in a more subtle way: https://chinamediaproject.org/2025/02/18/how-ai-is-tested-for-loyalty/?utm_source=substack&utm_medium=email
It’s true it censors or won’t give correct political answers. Definitely under control by political forces. I think it’s sketchy.
It's interesting to then compare the AI responses to historical sources, i.e. diplomatic cables published by WikiLeaks:
https://wikileaks.org/plusd/cables/89BEIJING18828_a.html
hi
This experiment is a good first start. But far more rigorous testing across a wider, deeper set of controversial questions is needed. And, OBVIOUSLY all USA AI needs similar testing around censorship and pro American bias.
More importantly, because most training data is The Internet, there is a profound bias towards The Standard Narrative that dismisses the tiny fraction of knowledge, belief and reality that counters the Received Truth. Not only is the internet self censored, it is also actively monitored and censored by social media platforms, Wikipedia, govt etcetera.
Even in supposedly neutral, unbiased fields like science I suspect Standard Narratives have so much more content they outweigh alt theories. Thus AI likely just regurgitates what everyone already believes. That's a problem. It drastically lowers the utility of AI as a tool for innovation and Truth.
So is AI just a gigantic bullshit machine? Time will tell.
But I'd like an algo trained on uncommon, counterfactual data as an extremely skeptical machine. I want an algo trained on "conspiracy theories" that counters "conventional wisdom" without privileging idiocy.
What about GPT and Google censorship standards about Israel and holocaust?
This is exactly why I think the American models should be open sourced too. They're probably just as capable, and it's frustrating as hell to experience this artificial nanny mode.
deepseek IS really good.