یک روش ترکیبی برای یافتن زیرمجموعه ویژگی مؤثر در داده‌های چند برچسبی

نویسندگان

دانشکده فنی مهندسی - دانشگاه شهید باهنر کرمان

چکیده

داده‌های چند برچسبی به داده‌هایی گفته می‌شود که در آن بر خلاف داده‌های تک برچسبی، هر نمونه می‌تواند متعلق به چند کلاس باشد. در سال‌های اخیر، به دلیل رشد روز افزون کاربردهای این داده‌ها، طبقه‌بندی داده‌های چند برچسبی توجه بسیاری از محققان را به خود جلب کرده است. مشابه طبقه‌بندی داده‌های تک برچسبی، در داده‌های چند برچسبی نیز حذف ویژگی‌های زائد و تکراری می‌تواند تأثیر زیادی در بهبود عملکرد طبقه‌بند داشته باشد. در این مقاله، یک روش ترکیبی برای انتخاب ویژگی در داده‌های چند برچسبی ارائه شده است. روش پیشنهادی بر پایه ترکیب یک روش فیلتری و یک روش پیچشی است که در روش پیچشی از الگوریتم‌های فرا ابتکاری استفاده شده است. از آنجا که معمولاً تعداد ویژگی‌های داده‌های چند برچسبی زیاد است، استفاده مستقیم از روش‌های جستجو، برای کشف زیرمجموعه ویژگی بهینه، هزینه محاسباتی بالایی دارد و ممکن است با شکست روبه رو شود. از این رو، ابتدا با استفاده از یک روش فیلتری، ویژگی‌های نامرتبط با کلاس‌ها حذف می‌شوند. سپس، از الگوریتم‌های تکاملی برای انتخاب برجسته‌ترین ویژگی‌ها استفاده می‌شود. در بخش آزمایش‌ها، تعداد قابل توجهی از الگوریتم‌های فرا ابتکاری مشهور بکار گرفته شده و جایگزین روش پیچشی در سامانه پیشنهادی شده است. نتایج به دست آمده نشان می‌دهند که روش پیشنهادی در برابر سایر روش‌های مورد مقایسه، دقت بالاتری دارند و در مواردی که دست یابی به دقت بالاتر، اهمیت بیشتری نسبت به زمان داشته باشد، استفاده از این روش مناسب‌تر است.

کلیدواژه‌ها


عنوان مقاله [English]

A Hybrid Method to Find Effective Subset of Features in Multi-label Datasets

نویسندگان [English]

  • S. Kashef
  • H. Nezamabadi-pour
Department of Electrical Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
چکیده [English]

In multi-label data, each instance is associated with a set of labels, instead of one label. Due to the increasing number of modern applications associated with multi-label data, multi-label classification has gained significant attention during recent years. As in single-label data, eliminating redundant and/or irrelevant features plays an important role in improving classification performance. In this paper, a hybrid method for multi-label feature selection problem based on combing filter and wrapper methods is proposed, where meta-heuristic algorithms are employed as the wrapper method. Since, the number of features in multi-label data is usually high, solely employing search algorithms for finding the optimal feature subsets has high computational burden, and is very possible to fail. Hence, irrelevant features are first detected and removed by a filter method. Then, salient features are found among the remained features by the help of meta-heuristic algorithms. A significant number of well-known meta-heuristic algorithms are employed as the wrapper method, in the proposed system. Experiments show that the proposed method obtains better classification results, compared to other algorithms.

کلیدواژه‌ها [English]

  • Multi-label dataset
  • feature selection
  • hybrid methods
  • filter methods
  • wrapper methods
  • meta-heuristic algorithms
[1] Q. Luo, E. Chen, and H. Xiong, "A semantic term weighting scheme for text categorization," Expert Systems with Applications, vol. 38, pp. 12708-12716, 2011.
[2] K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas, "Multi-Label Classification of Music into Emotions," in ISMIR, pp. 325-330, 2008.
[3] J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo, "Evaluating bag-of-visual-words representations in scene classification," in Proceedings of the international workshop on Workshop on multimedia information retrieval, pp. 197-206, 2007.
[4] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown, "Learning multi-label scene classification," Pattern recognition, vol. 37, pp. 1757-1771, 2004.
[5] S. Diplaris, G. Tsoumakas, P. A. Mitkas, and I. Vlahavas, "Protein classification with multiple algorithms," in Panhellenic Conference on Informatics, pp. 448-456, 2005.
[6] M.-L. Zhang and Z.-H. Zhou, "Multilabel neural networks with applications to functional genomics and text categorization," IEEE transactions on Knowledge and Data Engineering, vol. 18, pp. 1338-1351, 2006.
[7] S. Kashef and H. Nezamabadi-pour, "A new feature selection algorithm based on binary ant colony optimization," in Information and Knowledge Technology (IKT), 2013 5th Conference on, pp. 50-54, 2013.
[8] فاطمه علیقارداشی و محمدعلی زارع چاهوکی, "تأثیر ترکیب روش‌های انتخاب ویژگی فیلتر و بسته‌بندی در بهبود پیش‌بینی اشکال نرم‌افزار," مجله مهندسی برق دانشگاه تبریز، دوره 47، شماره 1، صفحات 183 تا 195، بهار 1396.
[9] شیما کاشف و حسین نظام‌آبادی‌پور, "ارائه یک نسخه جدید از الگوریتم مورچگان باینری به منظور حل مسأله انتخاب ویژگی," نشریه مهندسی برق و کامپیوتر ایران، دوره 12، شماره 2,صفحات 127 تا 144، زمستان 1393.
[10] S. Kashef and H. Nezamabadi-pour, "An advanced ACO algorithm for feature subset selection," Neurocomputing, vol. 147, pp. 271-279, 2015.
[11] حامد توحیدی, حسین نظام‌آبادی‌پور و س. سریزدی, "انتخاب ویژگی با استفاده از الگوریتم جمعیت مورچگان باینری," اولین کنگره مشترک سیستمهای فازی و هوشمند، مشهد، ایران, 1386.
[12] E. Rashedi, H. Nezamabadi-Pour, and S. Saryazdi, "BGSA: binary gravitational search algorithm," Natural Computing, vol. 9, pp. 727-745, 2010.
[13] L.-Y. Chuang, S.-W. Tsai, and C.-H. Yang, "Improved binary particle swarm optimization using catfish effect for feature selection," Expert Systems with Applications, vol. 38, pp. 12699-12707, 2011.
[14] N. SpolaôR, E. A. Cherman, M. C. Monard, and H. D. Lee, "A comparison of multi-label feature selection methods using the problem transformation approach," Electronic Notes in Theoretical Computer Science, vol. 292, pp. 135-151, 2013.
[15] M.-L. Zhang, J. M. Peña, and V. Robles, "Feature selection for multi-label naive Bayes classification," Information Sciences, vol. 179, pp. 3218-3229, 2009.
[16] M.-L. Zhang and Z.-H. Zhou, "ML-KNN: A lazy learning approach to multi-label learning," Pattern recognition, vol. 40, pp. 2038-2048, 2007.
[17] F. De Comité, R. Gilleron, and M. Tommasi, "Learning multi-label alternating decision trees from texts and data," in International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 35-49, 2003.
[18] E. Spyromitros, G. Tsoumakas, and I. Vlahavas, "An empirical study of lazy multilabel classification algorithms," in Hellenic conference on Artificial Intelligence, pp. 401-406, 2008.
[19] L. Zhang, Q. Hu, J. Duan, and X. Wang, "Multi-label feature selection with fuzzy rough sets," in International Conference on Rough Sets and Knowledge Technology, pp. 121-128, 2014.
[20] G. Doquire and M. Verleysen, "Feature selection for multi-label classification problems," in International Work-Conference on Artificial Neural Networks, pp. 9-16, 2011.
[21] J. Read, B. Pfahringer, and G. Holmes, "Multi-label classification using ensembles of pruned sets," in 2008 Eighth IEEE International Conference on Data Mining, pp. 995-1000, 2008.
[22] O. Reyes, C. Morell, and S. Ventura, "Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context," Neurocomputing, vol. 161, pp. 168-182, 2015.
[23] J. Lee and D.-W. Kim, "Feature selection for multi-label classification using multivariate mutual information," Pattern Recognition Letters, vol. 34, pp. 349-357, 2013.
[24] O. Reyes, C. Morell, and S. Ventura, "ReliefF-ML: an extension of reliefF algorithm to multi-label learning," in Iberoamerican Congress on Pattern Recognition, pp. 528-535, 2013.
[25] N. SpolaôR, E. A. Cherman, M. C. Monard, and H. D. Lee, "relief for multi-label feature selection," IEEE Brazilian Conference on Intelligent Systems (BRACIS), pp. 6-11, 2013.
[26] J. Lee and D.-W. Kim, "Memetic feature selection algorithm for multi-label classification," Information Sciences, vol. 293, pp. 80-96, 2015.
[27] Y. Lin, Q. Hu, J. Liu, and J. Duan, "Multi-label feature selection based on max-dependency and min-redundancy," Neurocomputing, vol. 168, pp. 92-103, 2015.
[28] H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," IEEE Transactions on pattern analysis and machine intelligence, vol. 27, pp. 1226-1238, 2005.
[29] J. Yin, T. Tao, and J. Xu, "A Multi-label feature selection algorithm based on multi-objective optimization," in Neural Networks (IJCNN), 2015 International Joint Conference on, pp. 1-7, 2015.
[30] N. Spolaôr, M. C. Monard, G. Tsoumakas, and H. D. Lee, "A systematic review of multi-label feature selection and a new method based on label construction," Neurocomputing, vol. 180, pp. 3-15, 2016.
[31] H. Lim, J. Lee, and D.-W. Kim, "Optimization approach for feature selection in multi-label classification," Pattern Recognition Letters, vol. 89, pp. 25-30, 2017.
[32] J. Lee and D.-W. Kim, "SCLS: Multi-label feature selection based on scalable criterion for large label set," Pattern Recognition, 2017.
[33] L. Qiao, L. Zhang, Z. Sun, and X. Liu, "Selecting label-dependent features for multi-label classification," Neurocomputing, 2017.
[34] L. Yu and H. Liu, "Feature selection for high-dimensional data: A fast correlation-based filter solution," in ICML, pp. 856-863, 2003.
[35] J. Biesiada and W. Duch, "Feature selection for high-dimensional data—a Pearson redundancy based filter," in Computer Recognition Systems 2, ed: Springer, pp. 242-249, 2007.
[36] C. G. Weng and J. Poon, "A new evaluation measure for imbalanced datasets," in Proceedings of the 7th Australasian Data Mining Conference-Volume 87, pp. 27-32, 2008.