Faculty of Electrical & Computer Engineering TABRIZ JOURNAL OF ELECTRICAL ENGINEERING 2008-7799 47 3 2017 11 22 A Method for Automatic Key phrase Extraction from Persian Web News A Method for Automatic Key phrase Extraction from Persian Web News 857 866 6219 FA M. Basereh Faculty of Electrical and Computer Engineering, University of Yazd, Yazd, Iran V. Derhami Faculty of Electrical and Computer Engineering, University of Yazd, Yazd, Iran S. Zarifzadeh Faculty of Electrical and Computer Engineering, University of Yazd, Yazd, Iran 0000-0001-5627-0542 Journal Article 2016 06 28 Text documents, especially news, are one of the important information retrieval fields which are necessary to extract information. This job, is done by extracting key phrases which include the main context of the news. In this research, a three level approach combining lingual, supervised learning, heuristic, and a relatively comprehensive number of statistical approaches, is suggested for key phrase extraction from Persian news web pages. A news dataset and a stop word list are generated. In this research, according to the data characteristics, Random Forest classifier is used; and its good performance is proved through experiments. Furthermore, using scores given by classifier to phrases, to build an ordered list of phrases, for classification, instead of using the classifier output, is suggested. Results show an acceptable f-measure. Text documents, especially news, are one of the important information retrieval fields which are necessary to extract information. This job, is done by extracting key phrases which include the main context of the news. In this research, a three level approach combining lingual, supervised learning, heuristic, and a relatively comprehensive number of statistical approaches, is suggested for key phrase extraction from Persian news web pages. A news dataset and a stop word list are generated. In this research, according to the data characteristics, Random Forest classifier is used; and its good performance is proved through experiments. Furthermore, using scores given by classifier to phrases, to build an ordered list of phrases, for classification, instead of using the classifier output, is suggested. Results show an acceptable f-measure.

https://tjee.tabrizu.ac.ir/article_6219_4f33a58972c07a79ba33e22812c65f5f.pdf