Extracting Order-Sensitive Word-to-Word Relations Using a Hierarchical Bayes Model

Document Type : Original Article

Authors

1 Department of Computer and IT Engineeringو, Shahrood University of Technologyو, Shahrood,و Iran

2 Department of Computer and IT Engineering,, Shahrood University of Technology, ,Shahrood,, Iran

3 Department of Computer and IT Engineering,,, Shahrood University of Technology,, Shahrood,, Iran

Abstract

In this paper, a hierarchical Bayes model is introduced which models local word relationships in a language. The model can be considered as a language model. The proposed model does not suffer from sparseness because it does not rely on the exact word orders. However, the model does not completely ignore the word orders.  The proposed generative model assumes that each word is a distribution over words and the current word is generated from the distribution of one of its previous words located in a fixed-size window. Contrary to an n-gram model which is a distribution over word sequences and so takes the exact sequences of words into account, the proposed model considers ordered pairs of words which can occur at different distances in the subject text data. Because of this, the sparseness problem is not severe for the proposed model. The model is compared with and outperformed n-gram model according to its ability to model text data which is evaluated by perplexity. 

Keywords


[1]      Croft, B. and J. Lafferty, “Language modeling for information retrieval. Vol. 13. Springer Science & Business Media, 2013.
[2]      Wallach, H.M. “Topic modeling: beyond bag-of-words”. ACM.2006.
[3]      3.   Manning, C.D., et al., Introduction to Information Retrieval. Cambridge University Press. 496. 2008.
[4]      Seneff, S. “The use of linguistic hierarchies in speech understanding”. in ICSLP. 1998.
[5]      Galescu, L. and J.F. Allen. “Hierarchical statistical language models: experiments on in-domain adaptation”. in INTERSPEECH. 2000.
[6]      Blei, D.M., A.Y. Ng, and M.I. Jordan, “Latent Dirichlet allocation”. Journal of machine Learning research. 3(Jan): p. 993-1022. 2003.
[7]      Noji, H., D. Mochihashi, and Y. Miyao. “Improvements to the Bayesian Topic N-Gram Models”. In EMNLP, pp. 1180-1190. 2013.
[8]      Graves, A. and N. Jaitly. “Towards End-To-End Speech Recognition with Recurrent Neural Networks”. in ICML. 2014.
[9]      Evershed, J. and K. Fitch. “Correcting noisy OCR: Context beats confusion”. in Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage. ACM. 2014.
[10]      Carlson, A. and I. Fette. “Memory-based context-sensitive spelling correction at web scale. in Machine learning and applications, sixth international conference on. ICMLA. IEEE.  2007.
[11]      Kneser, R. and H. Ney. “Improved backing-off for m-gram language modeling”. in 1995 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 1995.
[12]      Ney, H., U. Essen, and R. Kneser, “On structuring probabilistic dependences in stochastic language modelling”. Computer Speech & Language. 8(1): p. 1-38. 1994.
[13]      Jelinek, F. “Interpolated estimation of Markov source parameters from sparse data”. in Proc. Workshop on Pattern Recognition in Practice, 1980.
[14]      Chen, S.F. and J. Goodman, “An empirical study of smoothing techniques for language modeling”. Computer Speech & Language. 13(4): p. 359-394. 1999.
[15]      De Mulder, W., S. Bethard, and M.-F. Moens, “A survey on the application of recurrent neural networks to statistical language modeling”. Computer Speech & Language. 30(1): p. 61-98. 2015.
[16]      Deschacht, K., J. De Belder, and M.-F. Moens, “The latent words language model”. Computer Speech & Language. 26(5): p. 384-409. 2012.
[17]      Deoras, A., et al., “Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model”. Speech Communication. 55(1): p. 162-177. 2013.
[18]      Sidorov, G., et al., “Syntactic n-grams as machine learning features for natural language processing”. Expert Systems with Applications. 41(3): p. 853-860. 2014.
[19]      Deoras, A., et al. “Variational approximation of long-span language models for LVCSR”. in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2011.
[20]      Minka, T. and J. Lafferty. “Expectation-propagation for the generative aspect mode”l. Morgan Kaufmann Publishers Inc. 2002.
[21]      Griffiths, T.L. and M. Steyvers, “Finding scientific topics”. Proceedings of the National academy of Sciences, 101(suppl 1): p. 5228-5235. 2004.
[22]      رمضان هاونگی, «موقعیت‌یابی ربات براساس فیلتر ذره‌ای بهبود یافته با فیلتر کالمن گروهی هوشمند و گام MCMC».مجله مهندسی برق دانشگاه تبریز، دروه 46، شماره 4،صفحه 345-356. 1395.
[23]      سیامک عبداله‌زاده و دیگران، «استفاده از خوشه‌بندی و مدل مارکوف جهت پیش‌بینی درخواست آتی کاربر در وب». مجله مهندسی برق دانشگاه تبریز، دروه 45، شماره 3،صفحه 89-96. 1394.
[24]      Chelba, C., et al., “One billion word benchmark for measuring progress in statistical language modeling”. arXiv preprint arXiv:1312.3005, 2013.