Blocking Hugging Face is a big deal.
I do have disagreements though. I think the factor about blocking dataset access is not so significant. Anyone knows how to use VPN these days. Also, regulation seems okay as the for the well-funded guys at least it’s just a matter of paperwork. For small guys it can be a headache, but it takes much funding to build LLMs anyway so there by its very nature will have no small guys. But the point about the language, is what I think a fundamental challenge that can’t be solved. At least the online, written Chinese language is so low-quality compared with English language. What do you think? Really curious
just as it took some passionate researchers and guys in the bay.. there's the equivalent of some passionate researchers and kids that are seeing what's going on and see the huge potential and will create agi as well.. they maybe just not as vocal about it and we may not hear much about as well (though i've seen some work they're doing)
Outside China a LLM AI that refuses to go to "the sky is red" is desirable but maybe inside refusing to go to "good government is blue" is a desired result of datasets used for training.
LLMs are supposed to produce useful and usable writing based on what has already been written. The article is excellent showing the difficulty with Chinese training data but otherwise it is not clear to me on what "being ahead" with LLMs would be.
I agree with the assessment. I think at the most fundamental level, it’s because “speaking languages” is not at the core of societal needs and values in China (rather than “deeds”, or getting things done). Good public speakers is very rare and not encouraged. Good writings are also much fewer. Also, another fundamental reason is that Chinese language is sufficiently vague, often intentionally vague, so we don’t need to go to the level of “censorship” to explain the disadvantage. However, for non-language AIs, such as autonomous driving, image recognition, China may still have a chance. LLM is but one path of tech, although it’s really the hype at the moment.
This is an awesome and well articulate article! Well done.