论文部分内容阅读
自动语音识别系统在噪声环境下的性能通常会显著下降,这成为制约语音识别技术广泛应用的一个重大障碍。该文在他人的基于Gammatone的听觉特征(GFCC特征)研究基础上,进一步对GFCC与基于Mel频率的倒谱系数(MFCC)在不同噪声环境下的性能表现进行分析研究。选择5种人工和自然噪声进行比较试验:白噪声、粉红噪声、褐色噪声、背景说话人噪声、汽车噪声。通过混合不同类型和不同强度的噪声,系统地研究了基于听觉特性的GFCC特征的特性和抗噪能力;特别地,用不同频段的正弦波噪声与纯净语音混合,分析了GFCC和MFCC在各个频带上的噪声鲁棒性。研究发现,与传统的MFCC相比,GFCC对低频噪声具有更高的鲁棒性,而对中高频噪声相对敏感。由于人类发音通常在较低频率(300~700Hz),这一特性使得GFCC在语音识别任务中具有良好的抗噪能力。实验结果表明,GFCC在多种常见噪声环境下都取得了比MFCC更好的识别效果,特别是在低信噪比的情况下表现出更大的优势。
Automatic speech recognition systems typically experience significant performance degradation in noisy environments, a major obstacle to the widespread use of speech recognition technology. Based on the research of others’ Gammatone-based auditory features (GFCC features), this paper further analyzes the performance of GFCC and MFCC based on Mel frequency in different noisy environments. Five kinds of artificial and natural noise were selected for comparison experiments: white noise, pink noise, brown noise, background speaker noise, car noise. By mixing different types and intensities of noise, we systematically studied the characteristics and anti-noise ability of the GFCC features based on the auditory characteristics. In particular, using different frequencies of sinusoidal noise mixed with pure speech, the GFCC and MFCC were analyzed in different frequency bands On the noise robustness. The study found that, compared with the traditional MFCC, GFCC has higher robustness to low frequency noise and is relatively sensitive to mid-high frequency noise. Due to the fact that human speech is usually at lower frequencies (300-700 Hz), this feature allows GFCC to have good noise immunity in speech recognition tasks. The experimental results show that GFCC has better recognition performance than MFCC in many common noise environments, especially in the case of low signal-to-noise ratio.