عقیده‌کاوی در زبان فارسی مبتنی بر یادگیری انتقالی

نوع مقاله : علمی-پژوهشی

نویسندگان

1 دانشکده مهندسی کامپیوتر - دانشگاه یزد

2 دانشکده مهندسی کامپیوتر - دانشگاه شهرکرد

چکیده

 در دهه گذشته بررسی عقاید، احساسات و هم­چنین گرایش­های انسان‌ها نقش موثری در تصمیم­گیری مدیران و افراد داشته است. الگوریتم­های یادگیری ماشین نقش مهمی در زمینه عقیده­کاوی دارند، اما از یک مشکل بزرگ رنج می­برند: اغلب الگوریتم­های یادگیری ماشین فرض می­کنند که ابعاد ویژگی و توزیع داده­ها یکسان است، اما بسیاری از کاربردهای واقعی از این فرضیات تبعیت نمی­کنند. در واقع، داده­هایی که الگوریتم در آینده دریافت می­کند ممکن است دارای ابعاد و ویژگی متفاوت و یا از توزیع دیگری باشند. در این مقاله با استفاده از یادگیری انتقالی و با تاکید بر انتقال ویژگی، روشی نوین را برای بهبود استخراج احساسات و مفاهیم موجود در عقاید ارائه می‌دهیم. در روش پیشنهادی، ابتدا ویژگی یا موضوع عقاید در دامنه زبانی مبدأ شناسایی شده و با جمع­آوری صفات، قیدها و به‌طورکلی بسته­ای از احتمالاتی که ممکن است برای یک ویژگی رخ دهد و ترجمه آن به زبان مقصد، یادگیری از زبان مبدأ به زبان مقصد انتقال می­یابد. بررسی روش پیشنهادی روی داده­های موجود در فروشگاه اینترنتی آمازون به‌عنوان دامنه مبدأ نشان می­دهد با ایجاد الگوی انتقال ویژگی روی عقاید به زبان انگلیسی، می­توان قطبیت موجود در 77 درصد نظرات که به زبان فارسی ثبت ‌شده (در فروشگاه دیجی­کالا) را کشف نمود که نسبت به روش­های SCL، SFA و TCA به­ترتیب 9، 5 و 5 درصد افزایش بازدهی را نشان می­دهد. 

کلیدواژه‌ها


عنوان مقاله [English]

Persian Opinion Mining based on Transfer Learning

نویسندگان [English]

  • S. Dehghani 1
  • V. Derhami 1
  • A. M. Zare Bidoki 1
  • M. E. Basiri 2
1 Faculty of Computer Engineering, Yazd University, Yazd, Iran,
2 Faculty of Computer Engineering, Shahrekord University, Shahrekord, Iran
چکیده [English]

In the past decade, the study of human opinions, feelings and tendencies has been very effective in the decision-making of managers and individuals. Machine learning algorithms play an important role in the field of opinion mining, but they suffer from a big problem: most of the machine learning algorithms assume that the feature dimensions and data distribution are equal, but most of real-world applications don't follow these assumptions. In fact, the data that the algorithm will receive in the future may have different dimensions or distributions. In this article, a new method for improving sentiment analysis of opinions is proposed by the aid of feature-based transfer learning. In the proposed method, initially, the feature or topic of the opinion in the source language domain is identified. Then, by collecting adjectives, adverbs and totally a package of probabilities about that feature and by translating it into the target language, learning from the source language is transferred into the target language. An analysis of the proposed method on the data available at the Amazon store as the source domain indicates that by creating a pattern of feature transferring in English, the polarity of 77% of the opinions in Persian (recorded at the Digikala store) can be extracted that outperforms the SCL, SFA and TCA models with 9, 5 and 5 percent respectively.

کلیدواژه‌ها [English]

  • Opinion mining
  • Transfer learning
  • Feature transfer
  • Polarity
[1]      "Analysis of document pre-processing effects in text and opinion mining," Information, vol. 9, no. 4, pp. 1-13, 2018.
[2]      سعاد شریفات­زاده و محمدعلی زارع چاهوکی، «یادگیری انتقالی با روش تلفیقی از انتقال نمونه و نمایش ویژگی برای پیش­بینی نقص بین­پروژه­ای نرم­افزار»، مجله مهندسی برق دانشگاه تبریز، جلد 48، شماره 1، صفحات 112-101، بهار 1397.
[3]      S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, 2010.
[4]      K. Dashtipour, M. Gogate, A. Adeel, C. Ieracitano, H. Larijani and A. Hussain, "Exploiting deep learning for Persian sentiment analysis," in Proceeding of the 9th international conference on Advances in Brain Inspired Cognitive Systems (BICS 2018), pp. 597-605, Xi'an, China, 2018.
[5]      B. Pang and L. Lee, "Opinion mining and sentiment analysis," Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, pp. 1-153, 2008.
[6]      J. Barnes, R. Klinger and S. Schulte, "Bilingual sentiment embeddings: Joint protection of sentiment across languages," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 2483-2493, Melbourne, Australia, 2018.
[7]      H. Guo, H. Zhu, Z. Guo, X. Zhang, and Z. Su, "OpinionIt: A text mining system for cross-lingual opinion analysis," in Proceedings of the ACM conference on Information and Knowledge Management (CIKM-2010),  pp. 1199-1208, Toronto, Canada, 2010.
[8]      X. Wan, "Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis," in Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2008), pp. 553-561, Hawaii, USA, 2008.
[9]      J. Brooke, M. Tofiloski, and M. Taboada, "Cross-linguistic sentiment analysis: From English to Spanish," in Proceedings of International Conference RANLP, pp. 50-54, Borovets, Bulgaria, 2009.
[10]      X. Wan, "Co-training for cross-lingual sentiment classification," in Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP (ACL-IJCNLP-2009), pp. 235-243, Suntec, Singapore, 2009.
[11]      B. Wei, and C. Pal, "Cross-lingual adaptation: An experiment on sentiment classifications," in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics,  pp. 258–262, Uppsala, Sweden, 2010.
[12]      K. Duh, A. Fujino, and M. Nagata, "Is machine translation ripe for cross-lingual sentiment classification?," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:shortpapers (ACL-2011), pp. 429-433, Oregon, USA, 2011.
[13]      C. Banea, R. Mihalcea, and J. Wiebe. "Multilingual subjectivity: are more languages better?," In Proceedings of the International Conference on Computational Linguistics (COLING-2010), pp.      28-36, Beijing, China, 2010.
[14]      M. Bautin, L. Vijayarenu, and S. Skiena, "International sentiment analysis for news and blogs," in Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM-2008), pp. 19-26, Washington, U.S.A, 2008.
[15]      B. Lu, C. Tan, C. Cardie, and B. K. Tsou, "Joint bilingual sentiment classification with unlabeled parallel corpora," in Proceedings of the 49th Annual Meeting of the Association for Computational        Linguistics (ACL-2011), pp. 320-330, Oregon, USA, 2011.
[16]      M. S.رRasooli, N. Farra, A. Radeva, T. Yu, and K. McKeown, "Cross-lingual sentiment transfer with limited resources," Machine Translation, vol. 32, no. 1-2, pp. 143-165, 2018.
[17]      H. K. Aldayel and A. M. Azmi,  "Arabic tweets sentiment analysis - a hybrid scheme," Journal of Information Science, vol. 42, no. 6, pp. 782-797, 2015.
[18]      G. Vinodhini and R. Chandrasekaran, "Sentiment analysis and opinion mining: A survey," International Journal of Advanced Research in Computer Science and Software Engineering, vol. 2, no. 6, pp. 282–292., 2012.
[19]      I. Habernal,  T. Pt´aˇcek and J. Steinberg, "Supervised sentiment analysis in Czech social media," Information Processing and Management, vol. 50, no. 5, pp. 693–707, 2014.
[20]      H. Ghorbel and D. Jacot, Advances in distributed agent-based retrieval tools, Springer Berlin Heidelberg, 2011.
[21]      T. Scholz and S. Conrad, "Linguistic sentiment features for newspaper opinion mining," 18th International Conference on Applications of Natural Language to Information Systems (NLDB 2013), pp. 272–277, Salford, UK, 2013.
[22]      N. Medagoda, S. Shanmuganathan and J. Whalley, "A comparative analysis of opinion mining and sentiment classification in non-English languages," in Proceedings of International Conference on Advances in ICT for Emerging Regions (ICTER 2013),  pp. 144-148, Colombo, Sri Lanka, 2013.
[23]      F. Neri, C. Aliprandi, F. Capeci, M. Cuadros and T. By, "Sentiment analysis on social media," in Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 951–958, Istanbul, Turkey, 2012.
[24]      Y. Arakawa, A., Kameda, A., Aizawa and T. Suzuki, "Adding Twitter-specific features to stylistic features for classifying tweets by user type and number of retweets," Journal of the Association for Information Science and Technology, vol. 65, no. 7, pp. 1416–1423, 2014.
[25]      N. Yussupova, D. Bogdanova, and M. Boyko, "Applying of sentiment analysis for texts in Russian based on machine learning approach," in Proceedings of 2nd International Conference on Advances in Information Mining and Management, pp. 8-14, Venice, Italy, 2012.
[26]      D. Vilares, M. A. Alonso and C. G´omez-Rodr´ıguez, "A syntactic approach for opinion mining on Spanish reviews," Natural Language Engineering, vol. 21, no. 1, pp. 139-163, 2015.
[27]      P. Inrak and S. Sinthupinyo, "Applying latent semantic analysis to classify emotions in Thai text," 2nd International Conference on Computer Engineering and Technology (ICCET 2010), pp. 450–454, Chengdu, China, 2010.
[28]      M. Dorigo and C. Blum, "Ant colony optimization theory: A survey," Theoretical Computer Science, vol. 344, no. 2-3, pp. 243-278, 2005.
[29]      صمد نجاتیان، روح­اله امیدوار، حمید پروین، وحیده رضایی و میلاد یثربی، «یک الگوریتم جدید: الگوریتم کلونی موش­های وحشی»، مجله مهندسی برق دانشگاه تبریز، جلد 49، شماره 1، صفحات 437-425، بهار 1398.
[30]      S. R. Ahmad, A. Abu Bakr, and M. R. Yaakub, “Ant colony optimization for text feature selection in sentiment analysis," Intelligent Data Analysis, vol. 23, no. 1, pp.133-158, 2019.
[31]      M. Shams, A. Shakery and H. Faili, "A non-parametric LDA-based induction method for sentiment analysis," 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012), pp. 216-221, 2012.
[32]      A. Bagheri and M. Sarae, "Persian sentiment analyzer: A framework based on a novel feature selection mthod," International Journal of Artificial Intelligence, vol. 12, no. 2, pp. 115-129, 2014.
[33]      M. Basiri, A. Naghsh-Nilchi and N. Ghasem-Aghaee, "A framework for sentiment analysis in Persian," Open Transactions on Information Processing, vol. 1, no. 3, pp. 1-14, 2014.
[34]      S. Alimardani and  A. Aghaei, “Opinion mining in Persian language using supervised algorithms," Journal of Information Systems and Telecommunication, vol. 3, no. 3, pp. 135-141, 2015.
[35]      F. Amiri. S. Scerri and M. Khodashahi, “Lexicon-based sentiment analysis for Persian text," in Proceedings of the International Conference on Recent Advances in Natural Language Processing, pp. 9-16, , Hissar, Bulgaria, 2015.
[36]      E. Golpar-Rabooki, S. Zarghamifar, and J. Rezaeenour, "Feature extraction in opinion mining through Persian reviews," Journal of AI and Data Mining, vol. 3, no. 2, pp. 169-179, 2015.
[37]      E. Asgarian, M. Kahani and S. Sharifi, "The Impact of Sentiment Features on the Sentiment Polarity Classification in Persian Reviews," Cognitive Computation, vol. 10, no. 1, pp. 117-135, 2018.
[38]      S. Nemati, M. Basiri, N. Ghasem-Aghaee and M. Hosseinzadeh Aghdam, “Ant colony optimization for text feature selection in sentiment analysis," Expert Systems with Applications, vol. 36, no. 10, pp.12086-12094, 2009.
[39]      K. Denecke, "Are SentiWordNet scores suited for multi-domain sentiment classification?," Fourth International Conference on Digital Information Management (ICDIM 2009), pp. 1–6, Michigan, USA, 2009.
[40]      V. A. Kharde and S. S. Sonawane, "Sentiment analysis of twitter data: A survey of techniques," International Journal of Computer Applications, vol. 139, no. 11, pp. 5-15, 2016.
[41]      F. Debole and F. Sebastiani, "Supervised term weighting forautomated text categorization," in Proceedings of the 2003 ACM Symposium on Applied Computing, pp. 784-788, New York, USA, 2003.
[42]      M. Hosseinzadeh Aghdam, N. Ghasem-Aghaee and M. Basiri, “Text feature selection using ant colony optimization," Expert Systems with Applications, vol. 36, no. 3, pp.6843-6853, 2009.
[43]      S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, "Domain adaptation via transfer component analysis," IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 199–210, 2011.
[44]      S. J. Pan, X. Ni, J. T. Sun, Q. Yang, and Z. Chen, "Cross-domain sentiment classification via spectral feature alignment," in Proceedings of the 19th International Conference on World Wide Web (WWW 2010), pp. 751–760, North Carolina, USA, 2010.
[45]      J. Blitzer, M. Dredze, and F. Pereira, "Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification," in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL), pp. 440–447, Prague, Czech Republic, 2007.