A Multi-Objective Feature Selection Method based on the Conditional Mutual Information and Pareto Set Theory

Document Type : Original Article

Authors

1 Department of Computer Engineering, Azad University, Sanandaj Branch, Sanandaj, Iran,

2 Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran,

3 School of Engineering, RMIT University, Melbourne, Austrailia,

Abstract

Feature selection is the process of selecting a subset of features among the set of primary features, so that, by removing the redundant and irrelevant features, the accuracy of the classification increases. Because of the low computational complexity, scalability in term of data dimensions and independence of any classifier, filter selection methods are very important. But one of the weaknesses of these methods is the lack of information about the interaction and communication between the features which leads to select redundant and irrelevant features. Selection of redundant and irrelevant features is due to the inappropriate selection of an objective function which estimates the significance and redundancy of the features. In this paper, a nonlinear filter feature selection method, based on conditional mutual information and Pareto set is presented and to prove the efficiency of it a series of experiments are performed on twelve widely used datasets. According to the results, the proposed method is more accurate than a number of recently feature selection methods.

Keywords


[1]      H. Liu and H. Motoda, Computational Methods of Feature Selection (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series), Chapman \& Hall/CRC, 2007
[2]      I. T. Jolliffe, Principal Component Analysis, Springer-Verlag New York, 2002.
[3]      S. Mika, G. Ratsch, J. Weston, B. Scholkopf, K.R. Mullers, “Fisher discriminant analysis with kernels,” Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468), Instituted of Electrical and Electronic Engineering, Madison, WI, USA , 1992.
[4]      D. R. Hardoon, S. R. Szedmak, and J. R. Shawe-taylor, “Canonical correlation analysis: an overview with application to learning methods,” Neural Computation, vol. 16, no. 12, pp. 2664-2639, 2004.
[5]      G. H. Golub and C. F. V. Loan, Matrix Computations (Johns Hopkins Studies in Mathematical Sciences), Johns Hopkins University Press; 3rd edition, 1996.
[6]      S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 55, pp. 2323-2326, 2000.
[7]      A. J. Ferreira and M. A. T. Figueiredo, “An unsupervised approach to feature discretization and selection,” Pattern Recognition, vol. 45, no. 9, pp. 3048-3060, 2012.
[8]      C. Lai, M. J. T. Reinders, and L. Wessels, “Random subspace method for multivariate feature selection,” Pattern Recognition Letters, vol. 27, no. 10, pp.1067-1076, 2006.
[9]      A. E. Akadi, A. E. Ouardighi, and D. Aboutajdine, “A powerful feature selection approach based on Mutual Information,” International Journal of Computer Science and Network Security, vol. 8, no. 4, pp. 116-121, 2008.
[10]      L. Yu and H. Liu, “Feature selection for high-dimensional data: a fast correlation-based filter solution,” in 20th International Conference on Machine Learning, vol. 2, pp. 856- 863, 2003.
[11]      H. Uğuz, “A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm,” Knowledge-Based Systems, vol. 24, no. 7, pp. 1024-1033, 2011.
[12]      J. Yang, Y. Liu, Z. Liu, X. Zhu, and X. Zhang, “A new feature selection algorithm based on binomial hypothesis testing for spam filtering,” Knowledge-Based Systems, vol. 24, no. 6, pp. 904-914, 2011.
[13]      C.-M. Chen, H.-M. Lee, and C.-C. Tan, “An intelligent web-page classifier with fair feature-subset selection,” Engineering Applications of Artificial Intelligence, vol. 19, pp. 967-978, Vancouver, BC, Canada, 2006.
[14]      H. R. Kanan and K. Faez, “An improved feature selection method based on ant colony optimization (ACO) evaluated on face recognition system,” Applied Mathematics and Computation, vol. 205, no. 2, pp. 716-725, 2008.
[15]      Z. Yan and C. Yuan, “Ant colony optimization for feature selection in face recognition,” in Biometric Authentication. vol. 3072, pp. 221-226, ed: Springer Berlin Heidelberg, 2004.
[16]      I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. 46, no. 1-3, pp. 389-422, 2002.
[17]      H. Yu, G. Gu, H. Liu, J. Shen, and J. Zhao, “A modified ant colony optimization algorithm for tumor marker gene selection,” Genomics, Proteomics & Bioinformatics, vol. 7, no. 4, pp. 200-208, 2009.
[18]      A. Zibakhsh and M. S. Abadeh, “Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function,” Engineering Applications of Artificial Intelligence, vol. 26, no. 4, pp. 1274-1281, 2013.
[19]      C.-L. Huang and C.-Y. Tsai, “A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting,” Expert Systems with Applications, vol. 36, no. 2, pp. 1529-1539, 2009.
[20]      Y. Marinakis, M. Marinaki, M. Doumpos, and C. Zopounidis, “Ant colony and particle swarm optimization for financial classification problems,” Expert Systems with Applications, vol. 36, no. 7, pp. 10604-10611, 2009.
[21]      A. Kuri-Morales and F. Rodríguez-Erazo, “A search space reduction methodology for data mining in large databases,” Engineering Applications of Artificial Intelligence, vol. 22, no. 1, pp. 57-65, 2009.
[22]      P. M. Narendra and K. Fukunaga, “A branch and bound algorithm for feature subset selection,” IEEE Transactions on Computers vol. 26, no. 9, pp. 917-922, 1977.
[23]      H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, 2005.
[24]      R. Meiri and J. Zahavi, “Using simulated annealing to optimize the feature selection problem in marketing applications,” European Journal of Operational Research, vol. 171, no. 3, pp. 842-858, 2006.
[25]      H. Liu and H. Motoda, Computational Methods of Feature Selection: Chapman & Hall/CRC, 2007.
[26]      I. A. Gheyas and L. S. Smith, “Feature subset selection in large dimensionality domains,” Pattern Recognition, vol. 43, no. 1, pp. 5-13, 2010.
[27]      Y. Saeys, I. Inza, and P. Larrañaga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507-2517, 2007.
[28]      L. TH, L. HT, and s. KC, “Implementing the fisher's discriminant ratio in a k-means clustering algorithm for feature selection and data set trimming,” Journal of Chemical Information and Computer Sciences, vol. 44, no. 1, pp. 76-87, 2004.
[29]      R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, wiley intersience, 2001.
[30]      I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[31]      R. Battiti, “Using mutual information for selecting features in supervised neural net learning,” IEEE Transactions on Neural Networks, vol. 5, no. 4, pp. 537-550, 1994.
[32]      K. Kira and L. A. Rendell, “A practical approach to feature selection,” Proceedings of the ninth international workshop on Machine learning, pp. 249-256. 1992.
[33]      J. L. F. W. C. D. Y. Qian, “A group incremental approach to feature selection applying rough set technique,” IEEE Transactions on Knowledg and Data Engineering, vol. 26, pp. 294-308, 2014.
[34]      Y. Zhang, A. Yang, C. Xiong, T. Wang, and Z. Zhang, “Feature selection using data envelopment analysis,” Knowledge-Based Systems, vol. 64, pp. 70-80, 2014.
[35]      N. Kwak and C.-H. Choi, “Input feature selection for classification problems,” IEEE Transactions on Neural Networks., vol. 13, no. 1, pp. 143-159, 2002.
[36]      M. T. Pablo A. Estévez and J. M. Zurada, “Normalized mutual information feature selection,” presented at the IEEE Transactions on Neural Networks, vol. 20, no. 2, 2009.
[37]      N. Hoque, D. K. Bhattacharyya, and J. K. Kalita, “MIFS-ND: A mutual information-based feature selection method,” Expert Systems with Applications, vol. 41, no. 14, pp. 6371-6385, 2014.
[38]      P. E. Meyer, C. Schretter, and G. Bontempi, “Information-theoretic feature selection in microarray data using variable complementarity.,” IEEE Journal of Selected Topics in Signal Processing, vol. 2, no. 2,  pp. 261 - 274, 2008.
[39]      H. H. Yang and J. Moody, “Feature selection based on joint mutual information,” In Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis, pp. 22-25, 1999.
[40]      M. Bennasar, Y. Hicks, and R. Setchi, “Feature selection using joint mutual information maximisation,” Expert Systems with Applications, vol. 42, no. 22,  pp. 8520-8532, 2015.
[41]      J. A. T. Thomas M. Cover, Elements of Information Theory, Wiley Series in Telecommunications and Signal Processing, Wiley-Interscience, 2006.
[42]      J. Biesiada and W. Duch, “Feature selection for high-dimensional data: a pearson redundancy based filter,” Computer Recognition Systems, vol. 45, no. 2, pp. 242-249, 2007.
[43]      M. Haindl, P. Somol, D. Ververidis, and C. Kotropoulos, “Feature selection based on mutual correlation,” Pattern Recognition, Image Analysis and Applications, vol. 4225, pp. 569-577, 2006.
[44]      G. Brown, A. Pocock, M.-J. Zhao, and M. Luján, “Conditional likelihood maximisation: A unifying framework for information theoretic feature selection, ” The Journal of Machine Learning Research, vol. 13, no. 1, pp. 27-66, 2012.
[45]      J. ee and D. Kim, “Fast multi-label feature selection based on information -theoretic feature ranking, ” Pattern Recognition, vol. 48, no. 9, pp. 2761–2771, 2015.
[46]      G. Roffo and S. Melzi, “Features selection via eigenvector centrality,” In: Proceedings of New Frontiers in Mining Complex Patterns (NFMCP 2016), 2016.
[47]      G. Roffo and S. Melzi and M. Cristani, “Infinite Feature Selection,” IEEE International Conference on Computer Vision (ICCV), pp. 4202-4210, Santiago, Chile, 2015.
 
[48]      K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, “A fast elitist non-dominated sorting genetic, algorithm for multi-objective optimization: Nsga-ii,” Parallel Problem Solving from Nature PPSN VI , vol. 1917, pp. 849-858, Springer, Berlin, Heidelberg ,2000.
 
[49]      شیما کاشف و حسین نظام‌آبادی‌پور، «یک روش ترکیبی برای یافتن زیرمجموعه ویژگی مؤثر در داده‌های چند برچسبی»، دوره 48، شماره 3 - شماره پیاپی 85، صفحه 1327-1338، 1397.
[50]       X. Zhang, X. Liu, Y. Yang, “A fast feature selection algorithm by accelerating computation of fuzzy rough set-based information entropy”, Entropy, vol. 20, issue 10, pp. 788, 2018.
[51]      F. Li, D. Miao,W. Pedrycz, “Granular multi-label feature selection based on mutual information”, Pattern Recognition, vol. 67, pp 410- 423, 2017.
[52]      Q. Zhang , H. Li , “MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition”, IEEE Transactions on Evolutionary Computation ,vol. 11 , no. 6, pp. 712 – 731, 2007 .
[53]      فاطمه علیقارداشی و محمد علی زارع چاهوکی، «تأثیر ترکیب روش‌های انتخاب ویژگی فیلتر و بسته بندی در بهبود پیش بینی اشکال نرم افزار»، مجله مهندسی برق دانشگاه تبریز، دوره 47، شماره 1، صفحات  183 تا 195، 1396.