بهبود کیفیت گفتار مبتنی بر بهینه‌سازی ازدحام ذرات با استفاده از ویژگی های ماسک گذاری سیستم شنوائی انسان

نویسندگان

دانشگاه تبریز

چکیده

چکیده: در این مقاله، روش‌های دوکاناله جدید زیرفضایی سیگنال با استفاده از تکنیک‌های حذف نویز ادراکی برای بهبود کیفیت سیگنال گفتار پیشنهاد شده‌اند. روش‌های پیشنهادی از مشخصات ماسک‌گذاری سیستم شنوایی انسان برای کاهش نویز باقی­مانده قابل­شنیدن بهره می‌گیرند. روش تجزیه به مقادیر منفرد کسری ادراکی، تکنیک بهینه‌سازی ذرات را برای تخمین نویز جمعی استفاده می‌کند. نتایج شبیه‌سازی دقیقی که توسط دو معیار مختلفsubjectiveوobjectiveارزیابی شده است، نشان‌دهنده کیفیت بهتر سیگنال پردازش­شده توسط روش‌های پیشنهادی نسبت به الگوریتم‌های قبلی با هر دو نوع نویز ایستا و غیر‌ایستا و به‌خصوص نویز غیر‌سفید است.

کلیدواژه‌ها


عنوان مقاله [English]

PSO-Based Speech Enhancement using Masking Properties of Human Auditory system

چکیده [English]

Abstract: New dual-channel perceptually motivated subspace-based approaches are proposed for enhancement of speech corrupted by noise. The proposed methods take the frequency masking properties of the human auditory system into account and reduce perceptual effects of the residual noise. The perceptually constrained quotient singular value decomposition (PCQSVD) algorithm uses the particle swarm optimization (PSO) technique to estimate the additive noise. Very carefully performed objective evaluations and subjective tests show that the proposed approaches here can offer improved speech quality, as compared to previous methods, in the case of stationary and nonstationary noises, especially when the additive noise is nonwhite.

کلیدواژه‌ها [English]

  • Keywords: Auditory masking threshold
  • least-squares estimation
  • minimum-variance estimation
  • particle swarm optimization
  • quotient singular value decomposition
  • speech enhancement
[1] S. V. Huffel, “Enhanced Resolution Based on Minimum Variance Estimation and Exponential Data Modeling,” Signal Processing, vol. 33, pp. 333–355, 1993.
[2] B. T. Lilly and K. K. Paliwal, “Robust Speech Recognition Using Singular Value Decomposition Based Speech Enhancement,” in IEEE Speech and Image Technologies for Computing and Telecommunications, pp. 257–260, 1997.
[3] M. Klein and P. Kabal, “Signal Subspace Speech Enhancement with Perceptual Post-filtering,” in IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 537–540, 2002.
[4] S. H. Jensen, P. C. Hansen, S. D. Hansen and J. A. Sørensen, “Reduction of Broad-band Noise in Speech by Truncated QSVD,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 6, pp. 439–448, 1995.
[5] M. Dendrinos, S. Bakamidis and G. Garayannis, “Speech Enhancement from Noise: A Regenerative Approach,” Speech Communication, vol. 10, pp. 45–57, 1991.
[6] G. H. Ju and L. S. Lee, “Speech Enhancement Based on Generalized Singular Value Decomposition Approach,” in Proceedings of ICSLP, pp. 1801–1804, 2002.
[7] U. Mittal and N. Phamdo, “Signal/Noise KLT-based Approach for Enhancing Speech Degraded by Colored Noise,” IEEE Transactions on Speech and Audio Processing, vol. 8, no. 2, pp. 159–167, 2000.
[8] Y. Hu and P. Loizou, “A Subspace Approach for Enhancing Speech Corrupted by Colored Noise,” IEEE Signal Processing Letters, vol. 9, no. 7, pp. 204–206, 2002.
[9] H. Lev-Ari and Y. Ephraim, “Extension of the Signal Subspace Speech Enhancement Approach to Colored Noise,” IEEE Signal Processing Letters, vol. 10, no. 4, pp. 104–106, 2003.
[10] A. Rezayee and S. Gazor, “An Adaptive KLT Approach for Speech Enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 2, pp. 87–95, 2001.
[11] G. H. Golub and C. F. Van Loan, Matrix Computations, 2nd Ed. Baltimore, MD: Johns Hopkins Univ. Press, 1996.
[12] C. H. You, S. N. Koh and S. Rahardja, “Subspace Speech Enhancement for Audible Noise Reduction,” in. IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 145–148, 2005.
[13] E. Zwicker and H. Fastle, Psychoacoustics, 2nd Ed. New York: Springer-Verlag, 1999.
[14] Gwo-Hwa Ju and Lin-Shan Lee, “A Perceptually Constrained GSVD-Based Approach for Enhancing Speech Corrupted by Colored Noise,” IEEE Transactions on Speech and Audio Processing, vol. 15, no. 1, pp. 119-134, 2007.
[15] F. Jabloun and B. Champagne, “A Perceptual Signal Subspace Approach for Speech Enhancement in Colored Noise,” in IEEE International Conference on Acoustics, Speech, Signal Processing, pp. 569–572, 2002.
[16] S. M. Rozali, M. F. Rahmat and A. R. Husain, “Performance Comparison of Particle Swarm Optimization and Gravitational Search Algorithm to the Design of Controller for Nonlinear System,” Journal of Applied Mathematics, vol. 2014, pp. 1-9, 2014.
[17] S. Mirjalili and Hashim, “A New Hybrid PSOGSA Algorithm for Function Optimization,” International Conference on Computer and Information Application (ICCIA), pp. 374-377, 2010.
[18] K. Prajna, G. S. B. Rao, K. V. V. S. Reddy and R. Uma Maheswari, “A New Approach to Dual Channel Speech Enhancement Based on Gravitational Search Algorithm (GSA),” International Journal of Speech Technology, vol. 17, no. 4 , pp. 341–351, 2014.
[19] K. Prajna, G. S. B. Rao, K. V. V. S. Reddy and R. Uma Maheswari, “A New Approach to Dual Channel Speech Enhancement Based on Hybrid PSOGSA,” International Journal of Speech Technology, vol. 18, no. 1, pp. 45-56, 2014.
[20] J. Kennedy and R.C. Eberhart, “Particle Swarm Optimization,” in Proceedings of IEEE International Conference on Neural Networks, pp. 39–43, 1995.
[21] R. C. Eberhart and J. Kennedy, “A New Optimizer Using Particle Swarm Theory,” in Proceedings of Sixth International Symposium on Micromachine and Human Science, pp. 39–43,1995.
[22] Y. Niu and L. Shen, “An Adaptive Multi-objective Particle Swarm Optimization for Color Image Fusion,” Lecture Notes in Computer Science, LNCS, pp. 473–480, 2006.
[23] W. Yi, M. Yao and Zh. Jiang, “Fuzzy Particle Swarm Optimization Clustering and its Application to Image Clustering,” Lecture Notes in Computer Science, LNCS, pp. 459–467, 2006.
[24] W. Zhang and Y. Liu, “Adaptive Particle Swarm Optimization for Reactive Power and Voltage Control in Power Systems,” Lecture Notes in Computer Science, LNCS, pp. 449–452, 2006.
[25] S. Ghaemi Sardaroudi and M. Geravanchizadeh, “Speech Enhancement Using a Perceptually Constrained TQSVD-Based Approach Incorporating the PSO Technique,” 5th International Symposium on Telecommunicatios,pp. 863-868, 201.
[26] F. T. Luk, “A Parallel Method for Computing the Generalized Singular Value Decomposition,” Journal of Parallel and Distributed Computing, vol. 2, no. 3, pp. 250–260, 1985.
[27] S. Doclo and M. Moonen, “GSVD-based Optimal Filtering for Single and Multimicrophone Speech Enhancement,” IEEE Transactions on Signal Processing, vol. 50, no. 9, pp. 2230–2244, 2002.
[28] J. P. Kargo and K. V. Sorensen, A Rank-independent Signal Subspace Method for Speech Enhancement, M.Sc. Thesis, Aalborg University, Denmark, 2002.
[29] C. C. Paige and M. A. Saunders, “Towards a Generalized Singular Value Decomposition,” SIAM Journal on Numerical Analaysis, vol. 18, pp. 398–405, 1981.
[30] N. Virag, “Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System,” IEEE Transactions on Speech and Audio Processing, vol. 7, no. 2, pp. 126–137, 1999.
[31] K. Brandenburg and G. Stoll, “ISO-MPEG-1 Audio: A Generic Standard for Coding of High Quality Digital Audio,” Journal of the Audio Engineering Society, vol. 42, pp. 780–792, 1994.
[32] J. D. Johnston, “Transform Coding of Audio Signals Using Perceptual Noise Criteria,” IEEE Journal on Selected Areas in Communications, vol. 6, no. 2, pp. 314–323, 1988.
[33] D. E. Tsoukalas, J. N. Mourjopoulos and G. Kokkinakis, “Speech Enhancement Based on Audible Noise Suppression,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 5, pp. 497–514, 1997.
[34] M. Geravanchizadeh and S. Ghaemi Sardaroudi, “Incorporating the Human Hearing Properties in the GSVD-Based Approach for Speech Enhancement,” International Symposium on Communications, Control and Signal Processing, pp. 1-5, 2010.
[35] S. Ghaemi Sardaroudi and M. Geravanchizadeh, “A Perceptual Subspace Approach for Speech Enhancement,” International Symposium on Telecommunicatios, pp. I-569-I-572, 2010.
[36] M. H. Hayes, Statistical Digital Signal Processing and Modeling, New York: Wiley, 1999.
[37] J. Sohn and N. Kim, “Statistical Model-based Voice Activity Detection,” IEEE Signal Processing Letters, vol. 6, no. 1, pp. 1–3, 1999.
[38] R. Martin, “Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 504–512, 2001.
[39] http://http://ecs.utdallas.edu/loizou/speech/noizeus/. (available on Nov. 9, 2014)
[40] http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html. (available on Nov. 9, 2014)
[41] H. G. Hirsch and D. Pearce, “The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Condidions,” ISCA ITRW ASR2000, 2000.
[42] J. H. Holland, Adaptation in Natural and Artificial Systems, Ann Arbor, University of Michigan Press, 1975.