论文部分内容阅读
定义了汉语词组的随机文本模型(Monkey模型),揭示了真实汉语词组网具有无标度特性,指出了节点的平均度与总节点数的比值(k/N)基本上是个常量.通过模拟词组网络的演化,发现了按字频选择汉字是导致词组网幂律度分布的重要原因.适当地调节汉字的选择概率可以使汉语的Monkey语言表现出与自然语言类似的统计特性,大k时涌现幂律度分布,且幂指数大约是6.对比Monkey语言和真实汉语得出,人类能更好地运用汉字资源并以简洁的方式表达意图,从而证明了汉语词组网的组织结构服从自然界普遍存在的最小代价原则.
This paper defines a random text model of Chinese phrases (Monkey model), reveals that the real Chinese word net has scale-free features, and points out that the ratio of average nodes to total nodes (k / N) is basically a constant. It is found that the selection of Chinese characters by word frequency is an important reason for the power-law distribution of the phrase network.Adjusting the selection probabilities of Chinese characters can make the Chinese Monkey language show statistical characteristics similar to natural language, Power-law distribution, and the power index is about 6. Compared with the Monkey language and the real Chinese, human beings can make better use of Chinese resources and express their intention in a concise manner, which proves that the organizational structure of the Chinese word-net is ubiquitous in nature The principle of minimum cost.