I do have disagreements though. I think the factor about blocking dataset access is not so significant. Anyone knows how to use VPN these days. Also, regulation seems okay as the for the well-funded guys at least it’s just a matter of paperwork. For small guys it can be a headache, but it takes much funding to build LLMs anyway so there by its very nature will have no small guys. But the point about the language, is what I think a fundamental challenge that can’t be solved. At least the online, written Chinese language is so low-quality compared with English language. What do you think? Really curious
Any tech company in China worth its salt has a decent VPN set up, so companies who must access HF can and will (same goes for twitter, facebook, to do marketing)
That being said, HF is more than just a place to download free modes and datasets from, it is an entire collaborative tool chain, so not having reliable access, or fast access, brings down AI teams' productivity and pace of development, which, when you are paying catch up is painful
Whether blocking HF is significant or not, if developers (aka people who are doing the work) are complaining about it, then it's significant, and they are certainly complaining about it
just as it took some passionate researchers and guys in the bay.. there's the equivalent of some passionate researchers and kids that are seeing what's going on and see the huge potential and will create agi as well.. they maybe just not as vocal about it and we may not hear much about as well (though i've seen some work they're doing)
Outside China a LLM AI that refuses to go to "the sky is red" is desirable but maybe inside refusing to go to "good government is blue" is a desired result of datasets used for training.
LLMs are supposed to produce useful and usable writing based on what has already been written. The article is excellent showing the difficulty with Chinese training data but otherwise it is not clear to me on what "being ahead" with LLMs would be.
I agree with the assessment. I think at the most fundamental level, it’s because “speaking languages” is not at the core of societal needs and values in China (rather than “deeds”, or getting things done). Good public speakers is very rare and not encouraged. Good writings are also much fewer. Also, another fundamental reason is that Chinese language is sufficiently vague, often intentionally vague, so we don’t need to go to the level of “censorship” to explain the disadvantage. However, for non-language AIs, such as autonomous driving, image recognition, China may still have a chance. LLM is but one path of tech, although it’s really the hype at the moment.
I do have disagreements though. I think the factor about blocking dataset access is not so significant. Anyone knows how to use VPN these days. Also, regulation seems okay as the for the well-funded guys at least it’s just a matter of paperwork. For small guys it can be a headache, but it takes much funding to build LLMs anyway so there by its very nature will have no small guys. But the point about the language, is what I think a fundamental challenge that can’t be solved. At least the online, written Chinese language is so low-quality compared with English language. What do you think? Really curious
Any tech company in China worth its salt has a decent VPN set up, so companies who must access HF can and will (same goes for twitter, facebook, to do marketing)
That being said, HF is more than just a place to download free modes and datasets from, it is an entire collaborative tool chain, so not having reliable access, or fast access, brings down AI teams' productivity and pace of development, which, when you are paying catch up is painful
Whether blocking HF is significant or not, if developers (aka people who are doing the work) are complaining about it, then it's significant, and they are certainly complaining about it
just as it took some passionate researchers and guys in the bay.. there's the equivalent of some passionate researchers and kids that are seeing what's going on and see the huge potential and will create agi as well.. they maybe just not as vocal about it and we may not hear much about as well (though i've seen some work they're doing)
Outside China a LLM AI that refuses to go to "the sky is red" is desirable but maybe inside refusing to go to "good government is blue" is a desired result of datasets used for training.
LLMs are supposed to produce useful and usable writing based on what has already been written. The article is excellent showing the difficulty with Chinese training data but otherwise it is not clear to me on what "being ahead" with LLMs would be.
I agree with the assessment. I think at the most fundamental level, it’s because “speaking languages” is not at the core of societal needs and values in China (rather than “deeds”, or getting things done). Good public speakers is very rare and not encouraged. Good writings are also much fewer. Also, another fundamental reason is that Chinese language is sufficiently vague, often intentionally vague, so we don’t need to go to the level of “censorship” to explain the disadvantage. However, for non-language AIs, such as autonomous driving, image recognition, China may still have a chance. LLM is but one path of tech, although it’s really the hype at the moment.
This is an awesome and well articulate article! Well done.