[1] H. Liu and H. Motoda, Computational Methods of Feature Selection (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series), Chapman \& Hall/CRC, 2007
[2] I. T. Jolliffe, Principal Component Analysis, Springer-Verlag New York, 2002.
[3] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, K.R. Mullers, “Fisher discriminant analysis with kernels,” Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468), Instituted of Electrical and Electronic Engineering, Madison, WI, USA , 1992.
[4] D. R. Hardoon, S. R. Szedmak, and J. R. Shawe-taylor, “Canonical correlation analysis: an overview with application to learning methods,” Neural Computation, vol. 16, no. 12, pp. 2664-2639, 2004.
[5] G. H. Golub and C. F. V. Loan, Matrix Computations (Johns Hopkins Studies in Mathematical Sciences), Johns Hopkins University Press; 3rd edition, 1996.
[6] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 55, pp. 2323-2326, 2000.
[7] A. J. Ferreira and M. A. T. Figueiredo, “An unsupervised approach to feature discretization and selection,” Pattern Recognition, vol. 45, no. 9, pp. 3048-3060, 2012.
[8] C. Lai, M. J. T. Reinders, and L. Wessels, “Random subspace method for multivariate feature selection,” Pattern Recognition Letters, vol. 27, no. 10, pp.1067-1076, 2006.
[9] A. E. Akadi, A. E. Ouardighi, and D. Aboutajdine, “A powerful feature selection approach based on Mutual Information,” International Journal of Computer Science and Network Security, vol. 8, no. 4, pp. 116-121, 2008.
[10] L. Yu and H. Liu, “Feature selection for high-dimensional data: a fast correlation-based filter solution,” in 20th International Conference on Machine Learning, vol. 2, pp. 856- 863, 2003.
[11] H. Uğuz, “A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm,” Knowledge-Based Systems, vol. 24, no. 7, pp. 1024-1033, 2011.
[12] J. Yang, Y. Liu, Z. Liu, X. Zhu, and X. Zhang, “A new feature selection algorithm based on binomial hypothesis testing for spam filtering,” Knowledge-Based Systems, vol. 24, no. 6, pp. 904-914, 2011.
[13] C.-M. Chen, H.-M. Lee, and C.-C. Tan, “An intelligent web-page classifier with fair feature-subset selection,” Engineering Applications of Artificial Intelligence, vol. 19, pp. 967-978, Vancouver, BC, Canada, 2006.
[14] H. R. Kanan and K. Faez, “An improved feature selection method based on ant colony optimization (ACO) evaluated on face recognition system,” Applied Mathematics and Computation, vol. 205, no. 2, pp. 716-725, 2008.
[15] Z. Yan and C. Yuan, “Ant colony optimization for feature selection in face recognition,” in Biometric Authentication. vol. 3072, pp. 221-226, ed: Springer Berlin Heidelberg, 2004.
[16] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. 46, no. 1-3, pp. 389-422, 2002.
[17] H. Yu, G. Gu, H. Liu, J. Shen, and J. Zhao, “A modified ant colony optimization algorithm for tumor marker gene selection,” Genomics, Proteomics & Bioinformatics, vol. 7, no. 4, pp. 200-208, 2009.
[18] A. Zibakhsh and M. S. Abadeh, “Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function,” Engineering Applications of Artificial Intelligence, vol. 26, no. 4, pp. 1274-1281, 2013.
[19] C.-L. Huang and C.-Y. Tsai, “A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting,” Expert Systems with Applications, vol. 36, no. 2, pp. 1529-1539, 2009.
[20] Y. Marinakis, M. Marinaki, M. Doumpos, and C. Zopounidis, “Ant colony and particle swarm optimization for financial classification problems,” Expert Systems with Applications, vol. 36, no. 7, pp. 10604-10611, 2009.
[21] A. Kuri-Morales and F. Rodríguez-Erazo, “A search space reduction methodology for data mining in large databases,” Engineering Applications of Artificial Intelligence, vol. 22, no. 1, pp. 57-65, 2009.
[22] P. M. Narendra and K. Fukunaga, “A branch and bound algorithm for feature subset selection,” IEEE Transactions on Computers vol. 26, no. 9, pp. 917-922, 1977.
[23] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, 2005.
[24] R. Meiri and J. Zahavi, “Using simulated annealing to optimize the feature selection problem in marketing applications,” European Journal of Operational Research, vol. 171, no. 3, pp. 842-858, 2006.
[25] H. Liu and H. Motoda, Computational Methods of Feature Selection: Chapman & Hall/CRC, 2007.
[26] I. A. Gheyas and L. S. Smith, “Feature subset selection in large dimensionality domains,” Pattern Recognition, vol. 43, no. 1, pp. 5-13, 2010.
[27] Y. Saeys, I. Inza, and P. Larrañaga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol. 23, no. 19, pp. 2507-2517, 2007.
[28] L. TH, L. HT, and s. KC, “Implementing the fisher's discriminant ratio in a k-means clustering algorithm for feature selection and data set trimming,” Journal of Chemical Information and Computer Sciences, vol. 44, no. 1, pp. 76-87, 2004.
[29] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, wiley intersience, 2001.
[30] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
[31] R. Battiti, “Using mutual information for selecting features in supervised neural net learning,” IEEE Transactions on Neural Networks, vol. 5, no. 4, pp. 537-550, 1994.
[32] K. Kira and L. A. Rendell, “A practical approach to feature selection,” Proceedings of the ninth international workshop on Machine learning, pp. 249-256. 1992.
[33] J. L. F. W. C. D. Y. Qian, “A group incremental approach to feature selection applying rough set technique,” IEEE Transactions on Knowledg and Data Engineering, vol. 26, pp. 294-308, 2014.
[34] Y. Zhang, A. Yang, C. Xiong, T. Wang, and Z. Zhang, “Feature selection using data envelopment analysis,” Knowledge-Based Systems, vol. 64, pp. 70-80, 2014.
[35] N. Kwak and C.-H. Choi, “Input feature selection for classification problems,” IEEE Transactions on Neural Networks., vol. 13, no. 1, pp. 143-159, 2002.
[36] M. T. Pablo A. Estévez and J. M. Zurada, “Normalized mutual information feature selection,” presented at the IEEE Transactions on Neural Networks, vol. 20, no. 2, 2009.
[37] N. Hoque, D. K. Bhattacharyya, and J. K. Kalita, “MIFS-ND: A mutual information-based feature selection method,” Expert Systems with Applications, vol. 41, no. 14, pp. 6371-6385, 2014.
[38] P. E. Meyer, C. Schretter, and G. Bontempi, “Information-theoretic feature selection in microarray data using variable complementarity.,” IEEE Journal of Selected Topics in Signal Processing, vol. 2, no. 2, pp. 261 - 274, 2008.
[39] H. H. Yang and J. Moody, “Feature selection based on joint mutual information,” In Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis, pp. 22-25, 1999.
[40] M. Bennasar, Y. Hicks, and R. Setchi, “Feature selection using joint mutual information maximisation,” Expert Systems with Applications, vol. 42, no. 22, pp. 8520-8532, 2015.
[41] J. A. T. Thomas M. Cover, Elements of Information Theory, Wiley Series in Telecommunications and Signal Processing, Wiley-Interscience, 2006.
[42] J. Biesiada and W. Duch, “Feature selection for high-dimensional data: a pearson redundancy based filter,” Computer Recognition Systems, vol. 45, no. 2, pp. 242-249, 2007.
[43] M. Haindl, P. Somol, D. Ververidis, and C. Kotropoulos, “Feature selection based on mutual correlation,” Pattern Recognition, Image Analysis and Applications, vol. 4225, pp. 569-577, 2006.
[44] G. Brown, A. Pocock, M.-J. Zhao, and M. Luján, “Conditional likelihood maximisation: A unifying framework for information theoretic feature selection, ” The Journal of Machine Learning Research, vol. 13, no. 1, pp. 27-66, 2012.
[45] J. ee and D. Kim, “Fast multi-label feature selection based on information -theoretic feature ranking, ” Pattern Recognition, vol. 48, no. 9, pp. 2761–2771, 2015.
[46] G. Roffo and S. Melzi, “Features selection via eigenvector centrality,” In: Proceedings of New Frontiers in Mining Complex Patterns (NFMCP 2016), 2016.
[47] G. Roffo and S. Melzi and M. Cristani, “Infinite Feature Selection,” IEEE International Conference on Computer Vision (ICCV), pp. 4202-4210, Santiago, Chile, 2015.
[48] K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, “A fast elitist non-dominated sorting genetic, algorithm for multi-objective optimization: Nsga-ii,” Parallel Problem Solving from Nature PPSN VI , vol. 1917, pp. 849-858, Springer, Berlin, Heidelberg ,2000.
[49] شیما کاشف و حسین نظامآبادیپور، «یک روش ترکیبی برای یافتن زیرمجموعه ویژگی مؤثر در دادههای چند برچسبی»، دوره 48، شماره 3 - شماره پیاپی 85، صفحه 1327-1338، 1397.
[50] X. Zhang, X. Liu, Y. Yang, “A fast feature selection algorithm by accelerating computation of fuzzy rough set-based information entropy”, Entropy, vol. 20, issue 10, pp. 788, 2018.
[51] F. Li, D. Miao,W. Pedrycz, “Granular multi-label feature selection based on mutual information”, Pattern Recognition, vol. 67, pp 410- 423, 2017.
[52] Q. Zhang , H. Li , “MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition”, IEEE Transactions on Evolutionary Computation ,vol. 11 , no. 6, pp. 712 – 731, 2007 .
[53] فاطمه علیقارداشی و محمد علی زارع چاهوکی، «تأثیر ترکیب روشهای انتخاب ویژگی فیلتر و بسته بندی در بهبود پیش بینی اشکال نرم افزار»، مجله مهندسی برق دانشگاه تبریز، دوره 47، شماره 1، صفحات 183 تا 195، 1396.