论文部分内容阅读
为了改善耳语音话者识别的稳健性,提出了一种基于调幅-调频(AM-FM)模型的耳语音特征参数,瞬时频率估计(IFE)。根据语音产生的共振峰调制理论,采用多带解调分析(MDA)获得语音的瞬时包络和频率;然后根据包络幅度和频率的加权估计,得到语音的特征IFE来描绘语音的频率结构。将该特征用于耳语话者识别并和传统的Mel倒谱系数(MFCC)进行了比较。实验结果表明,随着测试人数的增加,IFE的识别效果略好于MFCC;在测试信道改变的情况下,与MFCC相比IFE的稳健性得到了有效的提高。
To improve the robustness of ear speaker recognition, an ear speech characteristic parameter and instantaneous frequency estimation (IFE) based on the AM-FM model is proposed. According to the resonance modulation theory of speech generation, the multi-band demodulation analysis (MDA) is used to obtain the instantaneous envelope and frequency of speech. Then, according to the weighted estimation of envelope amplitude and frequency, the speech IFE is obtained to describe the frequency structure of speech. This feature is used by whisper speakers to identify and compare with the traditional Mel Cepstral Coefficients (MFCC). The experimental results show that the IFE is slightly better than MFCC with the increase of the number of test subjects, and the robustness of IFE is improved compared with that of MFCC with the change of test channel.