Performance Improvement of Speaker Identification in Noisy Conditions

Performance Improvement of Speaker Identification in Noisy Conditions

Mohammad Mehdi Homayounpour, Ebrahim Sharifnavabi


Speaker identification systems are not only used in ordinary environment but are also used in adverse conditions with obtrusive factors. Voice inconformity can decrease the recognition performance because the training and testing environments may be different. In this paper, our objective is to render speaker identification systems robust against noisy and adverse conditions over telephone or internet. Utterances of 50 speakers from telephony FarsDat Speech database were used to evaluate our speaker identification system in noisy conditions. After removing silence from speech, signal to noise ratio of speech files are changed to 5, 10, 15 and 20dB. LPCC, LFCC, MFCC and MFCC coefficients obtained from Relative Autocorrelation Sequence (RAS) were used as speech features. GMM was used to model speakers. MFCCs were evaluated as the best feature among all cepstral features mentioned above. It was observed that removing the first cepstrum coefficient which represents the frame energy improves identification performance for 10.4%. Linear Weighting of cepstral coefficients, Band Pass Liftering, Cepstral Mean Subtraction, Post Filter, Dynamic Cepstral coefficients were also studied for more robustness. Almost all of these methods improve the identification performance. Linear Weighting was the best method among them. Combinations of the above methods were also evaluated. Most of these combinations led to better performances. The best result was obtained when MFCC coefficients with Linear Weighting and delta MFCC coefficients were used simultaneously in a feature vector.


speaker identification, noisy conditions, telephone, internet, FarsDat