Audio Event Detection Using the Mapping Segment on the Dictionary in Sparse Representation

Authors

1 Computer and IT Engineering Department, Shahrood University of Technology, Shahrood, Iran

2 Faculty of Computer and IT Engineering, Shahrood University of Technology, Shahrood, Iran

Abstract

Audio event detection (AED) is addressed by using a segment mapping on the NMF dictionary in the sparse representation. One problem with dictionary methods is the lack of controls in the decomposition process of the input signal, so the process yields some unstructured sound pieces that are not the valid audio events. We proposed an algorithm which uses sparsity constraint and beta-divergence to decompose the input segments into the predefined dictionary atoms instead.  Here, the sparsity control in each segment decomposes it into a linear combination of basis vectors thereby the segment is approximated into a hypothetical audio event. This method is applied to the recognition of variety live official sound events and has promising results.

Keywords


[1] T. Heittola, A. Mesaros, T. Virtanen, and A. Eronen, “Sound event detection in multisource environments using source separation,” in Proc. of CHiME, Munich, Germany, pp. 36–40, 2011.
[2] R. Hennequin, R. Badeau and B. David, “NMF with Time–Frequency Activations to Model Nonstationary Audio Events,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 744-753, 2011.
[3] Y. Ohishi, D. Mochihashi, T. Matsui, M. Nakano, H. Kameoka, T. Izumitani, and K. Kashino, “Bayesian semi-supervised audio event transcription based on Markov indian buffet process,” IEEE (ICASSP), Vancouver, Canada, pp. 3163–3167, 2013.
[4] X. Lu, Y. Tsao, S. Matsuda and C. Hori, “Sparse representation based on a bag of spectral exemplars for acoustic event detection,” IEEE (ICASSP), Florence, Italy, pp. 6255-6259, 2014.
[5] E. Benetos, G. Lafay, M. Lagrange, and M. Plumbley, “Detection of overlapping acoustic events using a temporally constrained probabilistic model,” IEEE (ICASSP), Shanghai, China, pp. 6450–6454, 2016.
[6] T. Komatsu, Y. Senda, and R. Kondo, “Acoustic event detection based on non-negative matrix factorization with mixtures of local dictionaries and activation aggregation,” IEEE (ICASSP), Shanghai, China, pp. 2259–2263, 2016.
[7] D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M.D. Plumbley, “Detection and classification of acoustic scenes and events,” IEEE Transactions on Multimedia vol. 17 no. 10 pp. 1733 – 1746, 2015.
[8] I. Choi, K. Kwon, S. Hyun Bae, and N, Soo Kim, “DNN-based sound event detection with exemplar-based approach for noise reduction,” in Proc. of IEEE (DCASE), Budapest, Hungary, pp. 16-19, September 2016.
  [9] مسعود گراوانچی زاده و صنم ایمانی شاملو، «جداسازی تک گوشی گفتار صدادار مبتنی بر روشهای جدید انتخاب واحدهای زمان- فرکانس در فرکانسهای پایین و بالا،» مجله مهندسی برق دانشگاه تبریز، جلد 43، شماره 1، صفحات 61-51، 1392.
[10] مسعود گراوانچی زاده و پریا دادور، «تخمین SNR ورودی با استفاده از ماسک باینری در سیستمهای مبتنی بر آنالیز ترکیب شنیداری محاسباتی»، مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 2، صفحات 196-187، 1395.
[11] J. F. Gemmeke, L. Vuegen, P. Karsmakers, B. Vanrumste, and H. Van hamme, “An exemplar-based NMF approach to audio event detection,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, PP. 1-4, Oct 2013.
[12] A. Cont, “Realtime multiple pitch observation using sparse non-negative constraints,” International Symposium on Music Information Retrieval (ISMIR), Victoria, Canada, PP. 206-211, Aug  2006.
[13] A. Cont, S. Dubnov, D. Wessel, “Realtime multiple-pitch and multiple-instrument recognition for music signals using sparse non-negative constraints,” in Proc. of 10th Int. Conf. Digital Audio Effects (DAFx), Bordeaux, France, PP. 85-92, 2007.
[14] S. Innami and H. Kasai, “NMF-based environmental sound source separation using time-variant gain features,” Computers & Mathematics with Applications, vol. 64, no. 5, pp. 1333 – 1342, 2012.
[15] M. W. Berry, M. Browne, A. Langville, V. P. Pauca, and R. J. Plemmons, “Algorithms and applications for approximate nonnegative matrix factorization,” Comput. Stat. Data Anal. Vol. 52, no. 1, pp. 155–173, 2007.
[16] A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation, Wiley-Blackwell, 2009.
[17] C. Fevotte, and J. Idier, “Algorithms for nonnegative matrix factorization with the beta-divergence,” Neural Computation, vol. 23, no. 9, pp. 2421-2456, 2011.
[18] M. Nakano, H. Kameoka, J. Le Roux, Y. Kitano, N. Ono, and S. Sagayama, “Convergence guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence,” IEEE International Workshop on Machine Learning for Signal Processing, pp. 283–288, Finland, 2010.
[19] D. L. Sun, C. Fevotte, “Alternating direction method of multipliers for nonnegative matrix factorization with the β-divergence,” IEEE (ICASSP), Florence, Italy, pp. 6201-6205, 2014.
[20] S. Boyd, L. Vandenberghe: Convex Optimization, Cambridge University Press, Cambridge, 2004.
[21] F. Sha, Y. Lin, L. K. Saul, and D. D. Lee, “Multiplicative updates for nonnegative quadratic programming,” Neural Computation, Vol. 19, no. 8, pp. 2004–2031, 2007.
[22] M. Shashanka, B. Raj, P. Smaragdis, “Probabilistic latent variable models as nonnegative factorizations,” Comput. Intell. Neurosci., doi: 10.1155/2008/947438, May 11, 2008.
[23] L. Vuegen, B. Van Den Broeck, P. Karsmakers, J. F. Gemmeke, B. Vanrumste, and H. Van hamme, “An MFCC-GMM approach for event detection and classification,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA,PP. 50-52, Oct 2013.
[24] Music Information Retrieval Evaluation eXchange (MIREX): Multiple Fundamental Frequency Estimation & Tracking. Available online: http://www.music-ir.org/mirex/, 2016.
[25] T. Heittola, M. Annamarie, sed_eval, Evaluation toolbox for online: https://github.com/TUT-ARG/sed_eval, 2016.