Fractional Eigen Based MVDR Beamformer for Speech Enhancement

Authors

Department of Electrical Engineering, Yazd University, Yazd, Iran.

Abstract

One of the most widely used beamforming algorithms for the application of speech enhancement is the Minimum Variance Distortionless Response (MVDR) technique. The optimal coefficients of the MVDR beamformer are calculated based on the incoherence assumption of environmental interferences and the desired signal. Due to the nature of noise and speech signals, this assumption is not valid in many practical situations. This, in turn, results in inaccurateness of derived coefficients of the MVDR. In this paper, as the first change in the MVDR beamformer, by applying the eigenvalue analysis to the desired signal covariance matrix and removing small eigenvalues, the accuracy of the beamformer coefficients is improved. As the second contribution, we use a generalized version of the Short-Time Fourier Transform (STFT), namely the Short-Time Fractional Fourier Transform (STFrFT), to calculate the MVDR beamformer weights. In this research, after obtaining the optimal value of STFrFT parameter experimentally, the effect of each of the above two changes on the performance is investigated and compared with the basic methods. The results show that the proposed methods, while being stable to the changes of parameters and environmental conditions, achieve signal-to-noise ratio (SNR) values between  and , while the performance of the baseline method is in the range of . Although each of the above changes alone improves the performance, it is noted that the superior performance is obtained when both changes are applied together on the beamformer.

Keywords


[1] E. A. P. Habets, J. Benesty, I. Cohen, S. Gannot, and J. Dmochowski, “New insights into the MVDR beamformer in room acoustics,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 18, no. 1, pp. 158–170, 2009.
[2] S. Zhang and X. Li, “Microphone array generalization for multichannel narrowband deep speech enhancement,” in Proceedings of INTERSPEECH, pp. 667-670, 2021.
[3] آوید آوخ، حمیدرضا ابوطالبی «بهسازی وفقی سیگنال گفتار در محیط‌های واقعی با استفاده از ساختار ترکیبی مبتنی بر شکل‌دهنده‌های پرتو و فیلتر پسینه»، مجله مهندسی برق دانشگاه تبریز، جلد 48، شماره 2، صفحات 495-481، 1397.
[4] امیرحسین حاج‌احمدی، محمد‌مهدی همایون‌پور، «بهسازی گفتار دومرحله‌ای توسط خودرمزنگار عمیق کاهنده نویز»، مجله مهندسی برق دانشگاه تبریز، جلد 50، شماره 4، صفحات 1540-1533، 1399.
[5] J. Benesty, J. Chen, and E. A. P. Habets, Speech enhancement in the STFT domain. Springer Science & Business Media, 2011.
[6] N. Yazdi and K.Todros, “Measure-transformed MVDR beamformeang” IEEE Signal Process. Lett., vol. 27, pp. 1959–1963, 2020.
[7] D. Ying and Y. Yan, “Robust and fast localization of single speech source using a planar array,” IEEE Signal Process. Lett., vol. 20, no. 9, pp. 909–912, 2013.
[8] V. W. Neo, C. Evers, and P. A. Naylor, “PEVD-based speech enhancement in reverberant environments,” in Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 186–190, 2020.
[9] V. W. Neo, C. Evers, and P. A. Naylor, “Enhancement of noisy reverberant speech using polynomial matrix eigenvalue decomposition,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, pp. 3255–3266, 2021.
[10] J. Shi, J. Zheng, X. Liu, W. Xiang, and Q. Zhang, “Novel short-time fractional Fourier transform: theory, implementation, and applications,” IEEE Trans. Signal Process., vol. 68, pp. 3280–3295, 2020.
[11] I. S. Yetik and A. Nehorai, “Beamforming using the fractional Fourier transform,” IEEE Trans. Signal Process., vol. 51, no. 6, pp. 1663–1668, 2003.
[12] S. Das and I. Pan, Fractional order signal processing: introductory concepts and applications. Springer Science & Business Media, 2011.
[13] J. Yin, K. Guo, X. Han, and G. Yu, “Fractional Fourier transform based underwater multi-targets direction-of-arrival estimation using wideband linear chirps,” Appl. Acoust., vol. 169, no. 1, 107477, 2020.
[14] E. A. P. Habets, “Room impulse response (RIR) generator,” May 2008. [Online]. Available: https://www.audiolabs-erlangen.de/fau/professor/habets/software/rir-generator.
[15] J. S. Garofolo et al., “TIMIT acoustic phonetic continuous speech corpus,” Linguist. Data Consortium, 1993, Accessed: May 24, 2022. [Online]. Available: http://ci.nii.ac.jp/naid/20000921365/en/.
[16] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,” in Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 749–752, 2001.
[17] S. Yook et al., “Modified segmental signal-to-noise ratio reflecting spectral masking effect for evaluating the performance of hearing aid algorithms,” Speech Commun., vol. 55, no. 10, pp. 1003–1010, 2013.
[18] Z. Wang, E. Vicent, R. Serizel, and Y. Yan., “Rank-1 constrained multichannel Wiener filter for speech recognition in noisy environments,” Comput. Speech & Lang., vol. 49, pp. 37–51, 2018.