论文部分内容阅读
提出了一种基于统计建模的可训练单元挑选语音合成方法.在模型训练阶段,提取训练语料库中的多种声学参数并训练各自对应的统计模型;在合成阶段,基于统计模型的最大似然准则实现语料库中最优备选单元序列的挑选;最终通过波形拼接输出合成语音.实验结果表明,该方法可以有效改善传统单元挑选与波形拼接语音合成方法在系统构建自动化程度低、对专家知识依赖性强、以及合成效果稳定性不足等方面的问题.此外,针对单元挑选语音合成的特点,提出了一种新的最小单元挑选错误准则,采用区分性模型训练方法进行模型参数的更新,实现了系统构建的全自动化,并进一步提高了合成语音的自然度.
A speech synthesis method based on statistical modeling is proposed.At the stage of model training, a variety of acoustic parameters in the training corpus are extracted and their corresponding statistical models are trained. In the synthesis stage, the maximum likelihood Criterion is used to select the optimal candidate unit sequence in the corpus, and finally the synthesized speech is output by the waveform splicing.The experimental results show that this method can effectively improve the performance of the traditional unit selection and waveform splicing speech synthesis in the low degree of system construction automation and expert knowledge And strong synthetical effect, as well as the lack of stability of the synthesis effect, etc. In addition, aiming at the characteristics of unit selection speech synthesis, a new minimum unit selection error criterion is proposed and the model parameters are updated by the discriminative model training method, System construction is fully automated and further enhances the naturalness of synthesized speech.