[1] A. Graves and N. Jaitly, “Towards End-To-End Speech Recognition with Recurrent Neural Networks,” Proceedings of the 31st International Conference on Machine Learning, 2014.
[2] Y. Miao, M. Gowayyed and F. Metze, “EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding,” ASRU, 2015.
|
[3] D. Amodei and a. et, “Deep Speech 2: End-to-End Speech Recognition in English and Mandarin,” International Conference on Machine Learning, New York, NY, USA, 2016.
|
[4] Y. Bengio, P. Lamblin, D. Popovici and H. Larochelle, “Greedy Layer-Wise Training of Deep Networks,” NIPS, 2006.
|
[5] H. Larochelle, Y. Bengio, J. Louradour and P. Lamblin, “Exploring Strategies for Training Deep Neural Networks,” JMLR, vol. 10, pp. 1-40, 2009.
|
[6] A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Springer, 2012.
|
[7] A. Zeyer, P. Doetsch, P. Voigtlaender, R. Schlüter and H. Ney, “A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition,” Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, 2017.
|
[8] مجتبی حاجیآبادی, عباس ابراهیمی مقدم و حسین خوشبین, «حذف نویز صوتی مبتنی بر یک الگوریتم وفقی نوین»، مجله مهندسی برق دانشگاه تبریز, جلد 46, شماره 3, صفحههای 139-147, پائیز 1395.
|
[9] مسعود گراوانچیزاده و ساناز قائمی سردرودی, «بهبود کیفیت گفتار مبتنی بر بهینهسازی ازدحام ذرات با استفاده از ویژگیهای ماسکگذاری سیستم شنوایی انسان»، مجله مهندسی برق دانشگاه تبریز, جلد 46, شماره 3, صفحههای 287-297, پاییز 1395.
|
[10] M. Seltzer, D. Yu and Y. Wang, “An investigation of deep neural networks for noise robust speech recognition,” Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.
|
[11] D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong and A. Acero, “Robust speech recognition using cepstral minimum-mean-square-error noise suppressor,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 5, 2008.
|
[12] S. Sun, B. Zhang, L. Xie and Y. Zhang, “An unsupervised deep domain adaptation approach for robust speech recognition,” Neurocomputing, vol. 257, pp. 79-87, 2017.
|
[13] V. Mitra, H. Franco, R. M. Stern, J. v. Hout, L. Ferrer, M. Graciarena, W. Wang, D. Vergyri, A. Alwan and J. H. L. Hansen, "Robust features in Deep Learning based Speech Recognition,” New Era for Robust Speech Recognition: Exploiting Deep Learning, Springer, 2017, pp. 187 - 217.
|
[14] A. M. C. Martinez, S. H. Mallidi and B. T. Meyer, “On the relevance of auditory-based Gabor features for deep learning in robust speech recognition,” Computer Speech and Language, vol. 45, no. C, pp. 21-38, 2017.
|
[15] D. Yu and M. Seltzer, “Improved Bottleneck Features Using Pretrained Deep Neural Networks,” INTERSPEECH, 2011.
|
[16] T. N. Sainath, B. Kingsbury and B. Ramabhadran, “Auto-encoder bottleneck features using deep belief networks,” Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012.
|
[17] J. e. a. Gehring, “Extracting deep bottleneck features using stacked auto-encoders,” Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.
|
[18] A. Senior, H. Sak, F. de Chaumont Quitry, T. N. Sainath and K. Rao, “Acoustic Modelling with CD-CTC-SMBR LSTM RNNS,” ASRU, 2015.
|
[19] H. Sak, A. W. Senior and F. Beaufays, “Long short-term memory recurrent neural network architectures for large scale acoustic modeling,” INTERSPEECH, 2014.
|
[20] A. L. Maas, Z. Xie, D. Jurafsky and A. Y. Ng., “Lexicon-Free Conversational Speech Recognition with Neural Networks,” NAACL, 2015.
|
[21] D. Yu, K. Yao and Y. Zhang, “The Computational Network Toolkit,” IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 123 - 126, 2015.
|
[22] D. Dorde, T. Grozdic, S. T. Jovicic and M. Subotic, “Whispered speech recognition using deep denoising autoencoder,” Engineering Applications of Artificial Intelligence, vol. 59, pp. 15-22, 2017.
|
[23] R. Fr, P. Matjka, F. Grzl, O. Plchot, K. Vesel and J. H. ernock, “Multilingually trained bottleneck features in spoken language recognition,” Computer Speech and Language, vol. 46, no. C, pp. 252-267, 2017.
|
[24] R. Martin, “Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics,” IEEE transactions on speech and audio processing, vol. 9, no. 5, pp. 504-512, 2001.
|