Proposing a New Framework for Automation of Thresholding in Wisdom of Crowds Cluster Ensemble Selection

Document Type : Original Article

Authors

1 Faculty of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China

2 Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran

3 School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran

Abstract

Recently, researchers proposed heuristic frameworks which are based on the Wisdom of Crowds in order to evaluate and select the basic results. In these methods, basic results are evaluated by diversity, independency and decentralization metrics. Then, the evaluated results are selected by thresholding, and combined by a consensus function. This paper aims to propose a method for automatic evaluation of the optimized threshold values based on the basic features of the input data in WOCCE. Also, Uniformity, a metric which is based on APMM, is introduced for calculating the diversity of two basic clustering results. Furthermore, Weighted Evidence Accumulation Clustering (WEAC), a new method for considering independency as a weight in the process of combining the basic results, is introduced in this paper. The experimental results indicate that the proposed method has higher efficiency in comparison with the results of other cluster ensemble methods.

Keywords


[1]      سمیرا رفیعی و پرهام مرادی، «بهبود عملکرد الگوریتم خوشه‌بندی فازی سی-مینز با وزن‌دهی اتوماتیک و محلی ویژگی‌ها»، مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 2، صفحه 86-75، تابستان 1395.
[2]      علیرضا سردار و رمضان هاونگی، «بهبود عملکرد الگوریتم خوشه‌یابی خودکار تصاویر رنگی به کمک پیش‌پردازش با شبکه عصبی خودسامانده»، مجله مهندسی برق دانشگاه تبریز، جلد 47، شماره 3، صفحه 1082-1073، پاییز 1396.
[3]      سیامک عبدالله‌زاده، محمدعلی بالافر و لیلی محمدخانلی، «استفاده از خوشه‌بندی و مدل مارکوف جهت پیش‌بینی درخواست آتی کاربر در وب»، مجله مهندسی برق دانشگاه تبریز، جلد 45، شماره 3، صفحه 96-89، پاییز 1394.
[4]      یوکابد صدری، علی آقاگل‌زاده و مهدی ازوجی، «ادغام تصاویر چندفوکوسه با استفاده از همدوسی فاز و خوشه‌بند K-means»، مجله مهندسی برق دانشگاه تبریز، جلد 45، شماره 4، صفحه 127-117، زمستان 1394.
[5]      رضا خدایی، محمدعلی بالافر و سیدناصر رضوی، «اثربخشی بسط پرس‌وجو مبتنی بر خوشه‌بندی اسناد شبه‌بازخورد با الگوریتم K-NN»، مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 1، صفحه 151-143، بهار 1395.
[6]      مجید محمدپور و حمید پروین، «الگوریتم ژنتیک آشوب‌گونه مبتنی بر حافظه و خوشه‌بندی برای حل مسائل بهینه‌سازی پویا»، مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 3، صفحه 318-299، پاییز 1395.
[7]      X. Wu, T. Ma, J. Cao, Y. Tian, and A. Alabdulkarim, “A comparative study of clustering ensemble algorithms,” Computers & Electrical Engineering, vol. 68, pp. 603-615, 2018.
[8]      A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys (CSUR), vol. 31, pp. 264-323, 1999.
[9]      F. Yang, T. Li, Q. Zhou, and H. Xiao, “Cluster ensemble selection with constraints,” Neurocomputing, vol. 235, pp. 59-70, 2017.
[10]      L. Bai, J. Liang, and Y. Guo, “An ensemble clusterer of multiple fuzzy k-means clusterings to recognize arbitrarily shaped clusters,” IEEE Transactions on Fuzzy Systems, 2018.
[11]      J. Bai, S. Song, T. Fan, and L. Jiao, “Medical image denoising based on sparse dictionary learning and cluster ensemble,” Soft Computing, pp. 1-7, 2017.
[12]      V. Berikov, N. Karaev, and A. Tewari, “Semi-supervised classification with cluster ensemble,” in Engineering, Computer and Information Sciences (SIBIRCON), 2017 International Multi-Conference on, 2017, pp. 245-250.
[13]      H. Alizadeh, Cluster Ensemble Selection Based on Mathematical and Social Optimization Methods (in Persian), PhD Thesis, Iran University of Science and Technology, 2014.
[14]      H. Alizadeh, M. Yousefnezhad, and B. M. Bidgoli, “Wisdom of Crowds cluster ensemble,” Intelligent Data Analysis, vol. 19, pp. 485-503, 2015.
[15]      A. Fred and A. Lourenço, “Cluster ensemble methods: from single clusterings to combined solutions,” in Supervised and Unsupervised Ensemble Methods and Their Applications, ed: Springer, 2008, pp. 3-30.
[16]      A. L. Fred and A. K. Jain, “Data clustering using evidence accumulation,” in Pattern Recognition, 2002. Proceedings. 16th International Conference on, 2002, pp. 276-280.
[17]      A. Strehl and J. Ghosh, “Cluster ensembles---a knowledge reuse framework for combining multiple partitions,” Journal of Machine Learning Research, vol. 3, pp. 583-617, 2002.
[18]      M. Yousefnezhad, Cluster Ensemble Selection Based on the Wisdom of Crowds (in Persian), MSc Thesis, Mazandaran University of Science and Technology, 2013.
[19]      M. Yousefnezhad, H. Alizadeh, and B. Minaei-Bidgoli, “New cluster ensemble selection method based on diversity and independent metrics (in Persian),” in 5th Conference on Information and Knowledge Technology (IKT’13), 2013, pp. 22-24.
[20]      M. Yousefnezhad and D. Zhang, “Weighted spectral cluster ensemble,” in Data Mining (ICDM), 2015 IEEE International Conference on, 2015, pp. 549-558.
[21]      H. Alizadeh, B. Minaei-Bidgoli, and H. Parvin, “Cluster ensemble selection based on a new cluster stability measure,” Intelligent Data Analysis, vol. 18, pp. 389-408, 2014.
[22]      H. Alizadeh, H. Parvin, and S. Parvin, “A framework for cluster ensemble based on a max metric as cluster evaluator,” IAENG International Journal of Computer Science, vol. 39, pp. 10-19, 2012.
[23]      X. Z. Fern and W. Lin, “Cluster ensemble selection,” Statistical Analysis and Data Mining, vol. 1, pp. 128-141, 2008.
[24]      A. K. Jain, A. Topchy, M. H. Law, and J. M. Buhmann, “Landscape of clustering algorithms,” in Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, 2004, pp. 260-263.
[25]      J. Surowiecki, “The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business,” Economies, Societies and Nations, vol. 296, 2004.
[26]      D. Yang, G. Xue, X. Fang, and J. Tang, “Crowdsourcing to smartphones: Incentive mechanism design for mobile phone sensing,” in Proceedings of the 18th Annual International Conference on Mobile Computing and Networking, 2012, pp. 173-184.
[27]      L. Baker and D. Ellison, “The wisdom of crowds—ensembles and modules in environmental modelling,” Geoderma, vol. 147, pp. 1-7, 2008.
[28]      B. Miller, P. Hemmer, M. Steyvers, and M. D. Lee, “The wisdom of crowds in rank ordering problems,” in 9th International Conference on Cognitive Modeling, 2009.
[29]      M. Steyvers, B. Miller, P. Hemmer, and M. D. Lee, “The wisdom of crowds in the recollection of order information,” in Advances in Neural Information Processing Systems, 2009, pp. 1785-1793.
[30]      P. Welinder, S. Branson, P. Perona, and S. J. Belongie, “The multidimensional wisdom of crowds,” in Advances in Neural Information Processing Systems, 2010, pp. 2424-2432.
[31]      D. P. Williams, “Underwater mine classification with imperfect labels,” in Pattern Recognition (ICPR), 2010 20th International Conference on, 2010, pp. 4157-4161.
[32]      S. K. Yi, M. Steyvers, M. Lee, and M. Dry, “Wisdom of the crowds in minimum spanning tree problems,” in Proceedings of the Annual Meeting of the Cognitive Science Society, 2010.
[33]      K. Faceli, A. C. De Carvalho, and M. C. De Souto, “Multi-objective clustering ensemble,” International Journal of HybridIntelligent Systems, vol. 4, pp. 145-156, 2007.
[34]      H. G. Ayad and M. S. Kamel, “Cumulative voting consensus method for partitions with variable number of clusters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, pp. 160-173, 2008.
[35]      A. Topchy, A. K. Jain, and W. Punch, “Combining multiple weak clusterings,” in Data Mining, 2003. ICDM 2003. Third IEEE International Conference on, 2003, pp. 331-338.
[36]      H. G. Ayad and M. S. Kamel, “Cluster-based cumulative ensembles,” in International Workshop on Multiple Classifier Systems, 2005, pp. 236-245.
[37]      A. L. Fred and A. K. Jain, “Combining multiple clusterings using evidence accumulation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 835-850, 2005.
[38]      L. I. Kuncheva and S. T. Hadjitodorov, “Using diversity in cluster ensembles,” in Systems, Man and Cybernetics, 2004 IEEE International Conference on, 2004, pp. 1214-1219.
[39]      A. L. Fred and A. K. Jain, “Learning pairwise similarity for data clustering,” in Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, 2006, pp. 925-928.
[40]      J. Azimi, J. Maani, and N. Mozayyeni, “Improved Clustering Ensembles (in Persian),” presented at the 11th International CSI Computer Conference (CSICC06), 2006.
[41]      J. Azimi and M. Analoui, “Distinguishing Marginal Samples to Improve Clustering Ensembles (in Persian),” presented at the 11th International CSI Computer Conference (CSICC06), 2006.
[42]      J. Azimi, M. Mohammadi, and M. Analoui, “Clustering ensembles using genetic algorithm,” in Computer Architecture for Machine Perception and Sensing, 2006. CAMP 2006. International Workshop on, 2006, pp. 119-123.
[43]      A. Ben-Hur, A. Elisseeff, and I. Guyon, “A stability based method for discovering structure in clustered data,” in Biocomputing 2002, ed: World Scientific, 2001, pp. 6-17.
[44]      T. Lange, V. Roth, M. L. Braun, and J. M. Buhmann, “Stability-based validation of clustering solutions,” Neural Computation,vol. 16, pp. 1299-1323, 2004.
[45]      P.-Y. Mok, H. Huang, Y. Kwok, and J. Au, “A robust adaptive clustering analysis method for automatic identification of clusters,” Pattern Recognition, vol. 45, pp. 3017-3033, 2012.
[46]      K. Arai and A. R. Barakbah, Hierarchical K-means: an algorithm for centroids initialization for K-means, Reports of the Faculty of Science and Engineering, vol. 36, pp. 25-31, 2007.
[47]      D. Pelleg and A. W. Moore, “X-means: Extending k-means with efficient estimation of the number of clusters,” in Icml, 2000, pp. 727-734.
[48]      D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. {UCI} Repository of machine learning databases, 1998.