A Method for Automatic Key phrase Extraction from Persian Web News

Authors

Faculty of Electrical and Computer Engineering, University of Yazd, Yazd, Iran

Abstract

Text documents, especially news, are one of the important information retrieval fields which are necessary to extract information. This job, is done by extracting key phrases which include the main context of the news. In this research, a three level approach combining lingual, supervised learning, heuristic, and a relatively comprehensive number of statistical approaches, is suggested for key phrase extraction from Persian news web pages. A news dataset and a stop word list are generated. In this research, according to the data characteristics, Random Forest classifier is used; and its good performance is proved through experiments. Furthermore, using scores given by classifier to phrases, to build an ordered list of phrases, for classification, instead of using the classifier output, is suggested. Results show an acceptable f-measure.

Keywords