Mongolian-Chinese Unsupervised Neural Machine Translation with Lexical Feature

来源 :第十八届中国计算语言学大会暨中国中文信息学会2019学术年会 | 被引量 : 0次 | 上传用户:guocheng2244
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Machine translation has achieved impressive performance with the advances in deep learning and rely on large scale parallel corpora.There have been a large number of attempts to extend these successes to low-resource lan-guage,yet requiring large parallel sentences.In this study,we build the Mongo-lian-Chinese neural machine translation model based on unsupervised methods.Cross-lingual word embedding training plays a crucial role in unsupervised ma-chine translation which generative adversarial networks(GANs)training meth-ods only perform well between two closely-related languages,yet the self-learn-ing method can learn high-quality bilingual embedding mappings without any parallel corpora in low-source language.In this work,apply the self-learning method is better than using GANs to improve the BLEU score of 1.0.On this basis,we analyze the Mongolian word lexical features and use stem-affixes seg-mentation in Mongolian to replace the Bytes-Pair-Encoding(BPE)operation,so that the cross-lingual word embedding training is more accurate,and obtain higher quality bilingual words embedding to enhance translation performance.We reporting BLEU score of 15.2 on the CWMT2017 Mongolian-Chinese da-taset,without using any parallel corpora during training.
其他文献
Natural Language Inference(NLI),which is also known as Recognizing Textual Entailment(RTE),aims to identify the logical relationship between a premise and a hypothesis.In this paper,a DCAE(Directly-Co
The neural components in deep learning framework are crucial for the performance of many natural language processing tasks.So far there is no systematic work to investigate the influence of neural com
Legal Cause Prediction(LCP)aims to determine the charges in criminal cases or types of disputes in civil cases according to the fact descriptions.The research to date takes LCP as a text classificatio
会议
Natural language inference(NLI)aims to predict whether a premise sentence can infer another hypothesis sentence.Models based on tree structures have shown promising results on this task,but the perfor
We present a Chinese judicial reading comprehension(CJRC)dataset which contains approximately 10K documents and almost 50K questions with answers.The documents come from judgment documents and the que
会议
Native ad is an important kind of online advertising which has similar form with the other content in the same platform.Compared with search ad,predicting the click-through rate(CTR)of native ad is mo
学位
学位
学位
As an endangered language,Tujia language only rely on oral communication.There must exist noises in the process of collecting Tujia language corpus.This paper studies an end-to-end speech enhancement