Chinese New Word Identification: A Latent Discriminative Model with Global Features

来源 :计算机科学技术学报 | 被引量 : 0次 | 上传用户:caozhi7963
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Chinese new words are particularly problematic in Chinese natural language processing. With the fast deve-lopment of Internet and information explosion, it is impossible to get a complete system lexicon for applications in Chinese natural language processing, as new words out of dictionaries are always being created. The procedure of new words identification and POS tagging are usually separated and the features of lexical information cannot be fully used. A latent discriminative model, which combines the strengths of Latent Dynamic Conditional Random Field (LDCRF) and semi-CRF, is proposed to detect new words together with their POS synchronously regardless of the types of new words from Chinese text without being pre-segmented. Unlike semi-CRF, in proposed latent discriminative model, LDCRF is applied to generate candidate entities, which accelerates the training speed and decreases the computational cost. The complexity of proposed hidden semi-CRF could be further adjusted by tuning the number of hidden variables and the number of candidate entities from the Nbest outputs of LDCRF model. A new-word-generating framework is proposed for model training and testing, under which the definitions and distributions of new words conform to the ones in real text. The global feature called "Global Fragment Features" for new word identification is adopted. We tested our model on the corpus from SIGHAN-6. Experimental results show that the proposed method is capable of detecting even low frequency new words together with their POS tags with satisfactory results. The proposed model performs competitively with the state-of-the-art models.
其他文献
This study investigates the correlation between PM10 and meteorological factors such as wind speed,atmospheric visibility,dew point,relative humidity,and ambien
针对露天矿生产需要,建立了基于挖掘机生产能力最大化目标的单斗-卡车间断工艺采掘带宽度优化模型,并进行了实例研究.根据单斗-卡车工艺端工作面生产特点,建立了挖掘机生产能
阜新地区处于辽宁西部,十年九旱的气候特点,严重束缚着该地区经济的可持续发展.因此研究该地区的水文气象资源特点,对于指导该地区的工农业生产具有重大的现实意义.
A novel inorganic-organic hybrid borate,[Al2(fum)(H3BO3)(OH)4]n·n(H3BO3)(1,H2fum=fumaric acid),has been synthesized and characterized by single-crystal X-ray d
The development of new drilling methods is important for the exploration and production of oil fields. The pulsed jet is a drilling technology of high potential
Compared with gentle dip long-wall caving, the length of a working face in fully-mechanized top-coal caving for extremely steep and thick seams is short, while
我们从已有的史料来发掘对我们现在的建设有用的思想和史实,譬如我国古代的司法制度,它是什么性质?如何适应我国古代中央集权体制,以及指导其思想又是什么,本文对此作了简要
目的 应用向量血流图(vector flow mapping,VFM)对比观察二叶式人工机械瓣置换术后心室流场变化规律.方法 30例正常志愿者为对照组,30例二尖瓣重度狭窄行二叶式人工机械瓣置
The target coverage is an important yet challenging problem in wireless sensor networks, especially when both coverage and energy constraints should be taken in
目的:建立一种准确、快速、简便的检测三聚氰胺(MEL)的胶体金免疫层析检测技术。方法:采用柠檬酸三钠还原法制备一定粒径的胶体金溶液,并标记抗MEL单克隆抗体,将一定浓度的包被抗原