Image Tag Completion by Applying SPFCM Clustering on the Features Learned by Deep Convolutional Neural Networks

Authors

Computer Department, Yazd University, Yazd, Iran

Abstract

Image tag completion is a process that aims to simultaneously enrich the missing tags and remove noisy tags. many of the images have vague, incomplete and irrelevant tags. These untrusted tags, reduce the accuracy of image retrieval. Hence, in recent years, many tag completion algorithms have been proposed in order to access to the tags associated with the content of images. Due to the effectiveness of deep learning in many research fields, in this paper a deep convolutional neural network has been used to extract suitable visual and semantic features of images. Also, considering the challenges involved in loading a large-scale image databases in memory, a Single Pass Fuzzy C-Means clustering algorithm is used in order to compute visually similar images and refining the image’s tags according to similar samples. The results show the effectiveness of proposed approach in images tag completion.

Keywords


[1] R. Datta, D. Joshi, J. Li and J. Z. Wang, “Image retrieval: ideas, influences and trends of the new age,”  ACM Computing Surveys, vol. 40, no. 2, 2008.
[2] مریم تقی‌زاده و عبداله چاله‌چاله، »مدلی به‌منظور بازیابی تصاویر مبتنی بر چند درخواست«، مجله مهندسی برق دانشگاه تبریز، دوره ۴۷، شماره ۳، صفحه ۸۹۳-۹۰۳، ۱۳۹۶.
[3] X. Li, L. Chen, L. Zhang, F. Lin, and W.-Y. Ma, “Image annotation by large-scale content-based image retrieval,” ACM International Conference on Multimedia, 2006.
[4] X. Rui, M. Li, Z. Li, W.-Y. Ma, and N. Yu, “Bipartite graph reinforcement model for web image annotation,” ACM International Conference on Multimedia, 2007.
[5] M. J. Huiskes and M. S. Lew, “The MIR flickr retrieval evaluation”, ACM International Conference on Multimedia Information retrieval, 2008.
 
[6] هنگامه دلجویی و امیرمسعود افتخاری مقدم، »حاشیه‌نویسی خودکار تصویر با استفاده از ارتباط معنایی بین نواحی مبتنی بر تئوری تصمیم چند شرطی«، مجله مهندسی برق دانشگاه تبریز، دوره ۴۲، شماره ۲، صفحه ۵۲-۳۹، ۱۳۹۲.
[7] C. Blake and C. J. Merz, UCI Repository of Machine LearningDatabases,http://mlearn.ics.uci.edu/MLRepository.html, University of California, Irvine, School of Information and Computer Sciences, vol 55. 1998.
[8] T. C. Havens, J. C. Bezdek, C. Leckie, L. O. Hall and M. Palaniswami, “Fuzzy c-means algorithms for very large data,” IEEE Transactions on Fuzzy Systems, vol. 20, no. 6, 2012.
[9] X. Li, T. Uricchio, L. Ballan, M. Bertini, C. G. M. Snoek and A. Del Bimbo, “Socializing the semantic gap: a comparative survey on image tag assignment, refinement, and retrieval,” ACM Computing Surveys (CSUR), vol. 49, no. 1, 2016.
[10] S. Lee, W. De Neve and Y. M. Ro, “Visually weighted neighbor voting for image tag relevance learning,” Multimedia Tools Applications, vol. 72, no. 2, pp. 1363–1386, 2014.
[11] T. Uricchio, L. Ballan, M. Bertini and A. Del Bimbo, “An evaluation of nearest-neighbor methods for tag refinement,” International Conference on Multimedia and Expo (ICME), 2013.
[12] L. Chen, D. Xu, I. W. Tsang and J. Luo, “Tag-based image retrieval improved by augmented features and group-based refinement,” IEEE Transactions on Multimedia, vol. 14, no. 4, pp. 1057–1067, 2012.
[13] G. Zhu, S. Yan and Y. Ma, “Image tag refinement towards low-rank, content-tag prior and error sparsity,”  International Conference of  Multimedia, pp. 461–470, 2010.
[14] J.Tang, X.Shu, G.J.Qi, Z.Li, M.Wang, S.Yan and R.Jain, “Tri-clustered tensor completion for social-aware image tag refinement,” IEEE Transactions on Pattern Analysis and Machine Intelligence., vol. 39, no. 8, pp. 1662–1674, 2017.
[15] X. Yang and F. Yang, “Completing tags by local learning: a novel image tag completion method based on neighborhood tag vector predictor,” Neural Computing and Applications , vol. 27, no. 8, pp. 2407–2416, 2016.
[16] Z. Feng, S. Feng, R. Jin and A. K. Jain, “Image tag completion by noisy matrix recovery,” European Conference on Computer Vision, pp. 424–438, 2014.
[17] Y. Bengio “Learning deep architectures for AI,”  Foundations and Trends in Machine Learning, vol. 2, no. 1, 2009.
[18] S. Lawrence, C. L. Giles, A. C. Tsoi and A. D. Back, “Face recognition: a convolutional neural-network approach,” IEEE Transactions on Neural Networks , vol. 8, no. 1, 1997.
[19] G. E. Hinton, “Deep belief networks,” Scholarpedia, vol. 4, no. 5, 2009.
[20] T. Mikolov, M. Karafiát, L. Burget, J. Cernock and S. Khudanpur, “Recurrent neural network based language model,”  Interspeech, vol. 2, pp.3, 2010.
[21] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Computer Vision and Pattern Recognition (CVPR), pp. 580-587, 2014.
[22] G. Hinton, L. Deng, D. Yu, G. Dahl, AR .Mohamed, N. Jaitly, A. Senior, V.  Vanhoucke, P. Nguyen, TN. Sainath and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, 2012.
[23] R. Collobert and J. Weston, “A unified architecture for natural language processing: deep neural networks with multitask learning,” International Conference on Machine Learning, 2008.
[24] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” International Conference on Learning Representations arXiv preprint arXiv:1409.1556, 2014.
[25] J. Deng, W. Dong, R. Socher, L.  Li, K. Li and L. Fei-Fei, “Imagenet: a large-scale hierarchical image database,” Computer Vision and Pattern Recognition, 2009.
[26] J. C. Dunn, “A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters,” 1973.
[27] V. Schwämmle and O. N. Jensen, “A simple and fast method to determine the parameters for fuzzy c–means cluster analysis,” Bioinformatics, vol. 26, no. 22, 2010.
[28] D. Dembélé and P. Kastner, “Fuzzy c-means method for clustering microarray data,” Bioinformatics, vol. 19, no. 8, 2003.
[29] D. Liu, X.-S. Hua, M. Wang and H.-J. Zhang, “Image retagging,” International Conference on Multimedia, 2010.
[30] X. Li, C. G. M. Snoek and M. Worring, “Learning social tag relevance by neighbor voting,” IEEE Transactions on Multimedia, vol. 11, no. 7, 2009.
[31] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo and Y. Zheng, “NUS-WIDE: a real-world web image database from national university of Singapore,” ACM International Conference on Image and Video Retrieval, 2009.
[32] Z. Lin, G. Ding, M. Hu, Y. Lin and S. S. Ge, “Image tag completion via dual-view linear sparse reconstructions,” Computer Vision Image Understanding, vol. 124, 2014.
[33] S. Zhu, S. Aloufi and A. El Saddik, “Utilizing image social clues for automated image tagging,” IEEE International Conference on Multimedia and Expo (ICME), 2015.
[34] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu and M. S. Lew, “Deep learning for visual understanding: A review,” Neurocomputing, vol. 187, 2016.
[35] NUS-WIDE Homepage, Lab for Media Search,      http://lms.comp.nus.edu.sg/research/NUS-WIDE.html,  Accessed 07.07.2017.
[36] J. Sang, C. Xu, and J. Liu, “User-aware image tag refinement via ternary semantic analysis,” IEEE Transactions on Multimedia, vol. 14, no. 3, 2012.