ارائه یک مدل پارامتریک تطبیقی جهت کشف و رده‌بندی وقایع صوتی در سیگنال‌های محیطی

نوع مقاله: علمی-پژوهشی

نویسندگان

دانشکده مهندسی کامپیوتر و فناوری اطلاعات - دانشگاه صنعتی شاهرود

چکیده

کشف وقایع صوتی در محیط کار و زندگی یک نیاز مدرن جهت گردآوری اطلاعات است. تاکنون بیشتر تحقیق‌ها بر واقعه صوتی خاص و یا تعداد محدودی از وقایع صوتی برجسته متمرکز بوده‌است. در اینجا یک مدل‌سازی جدید جهت کشف تمام وقایع صوتی رخ‌داده در رکورد و تعیین محدوده زمانی برای هر یک از آن‌ها ارائه شده‌است. نوآوری شامل مدل‌سازی جدید همراه با پارامترهای تطبیقی در مدل است. پس از استخراج ویژگی‌ها و تعیین مقادیر دو پارامتر آلفا و بتا از دو قطعه‌بندی مجزا و ترکیب خروجی آن‌ها برای تعیین وقایع صوتی و محدوده زمانی آن‌ها استفاده شده‌است. این وقایع جهت رده‌بندی به الگوریتم KNN فرستاده می‌شوند. پارامترها امکان دقت بیشتر و یا میزان کشف حداکثری را ممکن می‌سازند. وقایع صوتی آزمایش‌شده شامل 16 نوع صدای اتاق کار اداری هستند که برخی شبیه هم و بعضی نیز مشابه نویز محیط هستند. در سنجش عملکرد برحسب واقعه، میزان درستی کشف 70.1 درصد، فراخوانی 75.8 درصد و میزان F1، 72.8 درصد بوده‌است. همچنین میزان F1 برحسب فریم 80.6 درصد حاصل شد. مقدار F1 برحسب واقعه، نسبت به قبل 10.8% بهبود داشته‌است که مویدکارآمدی مدل پیشنهادی است.

کلیدواژه‌ها


عنوان مقاله [English]

Providing an Adaptive Model with two Adjustable Parameters for Audio Event Detection and Classification in Environmental Signals

نویسندگان [English]

  • M. Derakhshan
  • H. Marvi
  • H. Hassan poor
Computer and IT Engineering Department, Shahrood University of Technology, Shahrood, Iran
چکیده [English]

Audio event detection (AED) is a modern way to collect data about human activities in the workplace or in other life environments. We proposed a novel adaptable model based on using two parameters, α and ᵦ to detect all audio events that may be present in a given record accompanied by their time limits in which they occur. After feature extraction and setting the values of the two key parameters, alpha and beta, the audio sequence will be sent into two distinct sub-systems for event detection. The outputs from the two sub-classifiers are then combined and necessary refinements are made on the event time limits. The final detected events are sent to the KNN classifier. The parameters serve as a trade-off tool between precision and recall expectation in the detection process. In the tests, 16 different audio events of an office room were detected, some being similar to each other and some have very similar characteristics to those of the background noise. At frame-based (FB) level, the precision rate was 70.1%, the rate of recall was 75.8%, and F1-measure was 72.8%. The F1-measure has increased by 10.8% suggesting promising applications of the model.

کلیدواژه‌ها [English]

  • Audio event detection (AED)
  • environmental sounds
  • unsupervised learning
  • adaptable modeling systems
  • audio monitoring systems
  • audio-based acquisition systems
[1] F. Aurino, M. Folla, F. Gargiulo, V. Moscato, A. Picariello, and C. Sansone, “One-class SVM-based approach for detecting anomalous audio events,” International Conference on Intelligent Networking and Collaborative Systems, Salerno, Italy, pp. 145-151, 2014.
[2] V. Carletti, P. Foggia, G. Percannella, A. Saggese, N. Strisciuglio, and M. Vento, “Audio surveillance using a bag of aural words classifier,” Advanced Video and Signal Based Surveillance, 10th IEEE International Conference on, Krakow, Poland, pp. 81-86, 2013.
[3] R. Maher, “Acoustical modeling of gunshots including directional information and reflections,” in 131st Audio Engineering Society Convention, New York, NY, 2011.
[4] R. Cai, L. Lu, and A. Hanjalic, “Co-clustering for Auditory Scene Categorization,” in IEEE Transactions on Multimedia, vol. 10, no. 4, pp. 170-177, 2008.
[5] Y. Ohishi, D. Mochihashi, T. Matsui, M. Nakano, H. Kameoka, T. Izumitani, and K. Kashino, “Bayesian semi-supervised audio event transcription based on markov indian buffet process,” IEEE (ICASSP), Vancouver, Canada, pp. 3163–3167, 2013.
[6] E. Benetos, G. Lafay, M. Lagrange, and M. Plumbley, “Detection of overlapping acoustic events using a temporally constrained probabilistic model,” IEEE (ICASSP), shanghai, china, pp. 6450–6454, 2016.
[7] D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, “ Detection and classification of acoustic scenes and events ” IEEE Transactions on Multimedia vol. 17 no. 10 pp. 1733 – 1746, 2015.
[8] R. Togneri and D. Pullella, “An overview of speaker identification: Accuracy and robustness issues,” IEEE Circuits and Systems Magazine, vol. 11, no. 2, pp. 23–61, 2011.
[9] S. Pancoast and M. Akbacak, “Bag-of-audio-words approach for multimedia event classification,” in Interspeech, Portland, Oregon, USA, 2012.
[10] A. Plinge, R. Grzeszick, and G. Fink, “A Bag-of-Features approach to acoustic event detection,” in IEEE (ICASSP),Florence, Italy, May 2014.
[11] T. Heittola, A. Mesaros, T. Virtanen, and A. Eronen, “Sound event detection in multisource environments using source separation,” in Proc. CHiME, Florence, Italy, pp. 36–40, 2011.
[12] R. Hennequin, R. Badeau and B. David, "NMF with Time–Frequency Activations to Model Nonstationary Audio Events," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 744-753, 2011.
[13] T. Komatsu, Y. Senda, and R. Kondo, “Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation,” IEEE (ICASSP), shanghai, china, pp. 2259–2263, 2016.
[14] X. Lu, Y. Tsao, S. Matsuda and C. Hori, “Sparse representation based on a bag of spectral exemplars for acoustic event detection,” IEEE (ICASSP), Florence, Italy, pp. 6255-6259, 2014.
[15] IEEE DCASE 2016 Challenge, http://www.cs.tut.fi/sgn/arg/dcase2016/, 2016.
[16] I. Choi, K. Kwon, S. Hyun Bae, and N. Soo Kim, “DNN-based sound event detection with exemplar-based approach for noise reduction,” in Proc. IEEE (DCASE), Budapest, Hungary, September 2016.
[17] T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. Le Roux, and K. Takeda, “Bidirectional LSTM-HMM hybrid system for polyphonic sound event detection,” in Proc. IEEE (DCASE), Budapest, Hungary, September 2016.
[18] J. Kurby, R. Grzeszick, A. Plinge, and G A. Fink, “Bag-of-features acoustic event detection for sensor networks,” in Proc. IEEE (DCASE), Budapest, Hungary, September 2016.
[19] M. Zohrer, and F. Pernkopf, “Gated recurrent networks applied to acoustic scene classification and acoustic event detection,” in Proc. IEEE (DCASE), Budapest, Hungary, September 2016.
[20] X. Zhuang, X. Zhou, M. Hasegawa-Johnson, and T. S. Huang, “Real-world acoustic event detection,” Pattern recognition Letters, vol. 31, no. 12, pp. 1543–1551, 2010.
[21] E. Miquel, F. Masakiyo, S. Daisuke, O. Nobutaka, and S. Shigeki, “A tandem connectionist model using combination of multi-scale spectro-temporal features for acoustic event detection”. in Proc. IEEE (ICASSP), Kyoto, Japan, pp. 4293–4296, 2012.
[22] L. Vuegen, B. Van Den Broeck, P. Karsmakers, J. F. Gemmeke, B. Vanrumste, , and H. Van hamme, “An MFCC-GMM approach for event detection and classification,” IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), New Paltz, NY, USA, Oct. 2013.
[23] مجتبی حاجی آبادی، عباس ابراهیمی مقدم و حسین خوش‌بین، «حذف نویز مبتنی بر یک الگوریتم وفقی نوین،» مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 3، صفحات 147-139، 1395.
[24] مسعود گراوانچی‌زاده و ساناز قائمی سردرودی، «بهبود کیفیت گفتار مبتنی بر بهینه‌سازی ازدحام ذرات با استفاده از ویژگی‌های ماسک‌گذاری سیستم شنوائی انسان»، مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 3، صفحات 297-287، 1395.
[25] T. Fawcett, “ROC Graphs: Notes and Practical Considerations for Researchers,” Pattern Recognition Letters, vol. 27, no. 8, pp. 882–891, 2004.
[26] J. T. Geiger, B. Schuller, and G. Rigoll, "Recognizing acoustic scenes with large-scale audio feature extraction and SVM," TUM, technical report, 2013.
[27] D. Li, J. Tam, and D. Toub, "Auditory scene classification using machine learning techniques," technical report, 2013.
[28] X. Zhou, X. Zhuang, M. Liu, H. Tang, M. Hasegawa-Johnson, and T. Huang, “HMM-based acoustic event detection with AdaBoost feature selection,” in Multimodal Technologies for Perception of Humans,  Springer-verlag Berlin, Heidelberg, pp. 345-353, 2008.
[29] A. Mesaros, T. Heittola, A. Eronen, and T. Virtanen, “Acoustic event detection in real life recordings,” in Proceedings of the 18th European Signal Processing Conference, Eusipco 2010, Aalborg, Denmark, pp. 1267-1271, August 2010.
[30] W. Nogueira, G. Roma, and P. Herrera, “Automatic event classification using front end single channel noise reduction, MFCC features and a support vector machine classifier,” IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), New Paltz, NY, USA, Oct. 2013.
[31] M. E. Niessen, T. L. M. V. Kasteren, and A. Merentitis, “Hierarchical modeling using automated sub-clustering for sound event recognition,” in Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust. (WASPAA), New Paltz, NY, USA, Oct. 2013, pp. 1–4.
[32] J. F. Gemmeke, L. Vuegen, P. Karsmakers, B. Vanrumste, and H. V. hamme, “An exemplar-based NMF approach to audio event detection,” IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), New Paltz, NY, USA, Oct. 2013.
[33] L. Vuegena, B. V. D. Broeck, P. Karsmakers, J. F. Gemmeke, B. Vanrumste, and H. V. hamme, “An MFCC-GMM approach for event detection and classification,” IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), New Paltz, NY, USA, Oct. 2013.
[34] J. Schröder, B. Cauchi, M. R. Schädler, N. Moritz, K. Adiloglu, J. Anemüller, S. Doclo, B. Kollmeier, and S. Goetze, “Acoustic event detection using signal enhancement and spectro-temporal feature extraction,” IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), New Paltz, NY, USA, Oct. 2013.
[35] A. Diment, T. Heittola, and T. Virtanen, “Sound event detection for office live and office synthetic AASP challenge,” IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), New Paltz, NY, USA, Oct. 2013.
[36] J. Schroder, S. Goetze, and J. Anemuller, “Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection,” in IEEE/ACM Transactions on Audio, and Language Processing, vol. 23, no. 12, pp. 2198-2208, 2015.