Long-term Visual Object Tracking of Arbitrary Objects Based on Switching Between Traditional Method and Deep Learning Technique

Document Type : Original Article

Authors

Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran

Abstract

Visual tracking of the arbitrary object is a fundamental and challenging topic in the field of machine vision, which has traditionally been done by considering a model for the target and using the training data of the same video. Most trackers can hardly top the results of the most popular methods when considering real-time and online performance. In this article, a tracker framework based on the Siamese network is presented, which is an online tracker learning and a real-time tracking process, and its name is STD-Siam. Since the Siamese network has limited online training and cannot handle the challenges of tracking for the long term, STD-Siam aims to switch between traditional tracking and deep learning, training both trackers to eliminate the ambiguity between the target and the background in each scenario. First, the training data is generated through the traditional tracker, then these data are expanded with the augmentation technique so that the deep network can be trained well. This method can be executed at a speed of 66 FPS, and compared to the current similar algorithms, despite its simplicity, it can achieve good results and track the target for the long term. This tracking speed is beyond real-time due to the spike detector in the frequency domain, which accurately calculates the selected target candidates and avoids blindly scanning the entire image to reduce the computational burden.

Keywords

Main Subjects


[1] Liu, L., Xing, J., and Ai, H.: Multi-view vehicle detection and tracking in crossroads. in Proceedings of the Asian Conference on Pattern Recognition (ACPR), pp. 608–612 (2011)
[2] Liu, L., Xing, J., Ai, H., and Ruan X.: Hand posture recognition using finger geometric feature. in IEEE International Conference on Pattern Recognition (ICPR), pp. 565–568 (2012)
[3] Emami, A., Dadgostar, F., Bigdeli, A., and Lovell, B.: Role of spatiotemporal oriented energy features for robust visual tracking in video surveillance. in IEEE International Conference on Advanced Video and Signal-Based Surveillance, pp. 349–354 (2012)
[4] Zhang, m., Xing, J., Gao, J., and Hu, W.: Robust visual tracking using joint scale-spatial correlation filters. in IEEE International Conference on Image Processing (ICIP) (2015)
[5] وحید آزادزاده، علی محمد لطیف، «دسته‌بندی ویژگی‌های استخراج شده از پیش‌زمینه و پس‌زمینه تصویر برای ردیابی اهداف متحرک هوایی»، مجله مهندسی برق دانشگاه تبریز، جلد 46، شماره 3، پاییز 1395
[6] عقیل عبیری، محمدرضا محزون، « ردیابی اهداف متحرک هوایی با استفاده از تخمین چگالی کرنل بر اساس الگوریتم فیلتر ذره»، مجله مهندسی برق دانشگاه تبریز، جلد 45، شماره 3، پاییز 1394.
[7] Guo, Q., Feng, W., Zhou, C., Pun, C.M., and Wu, B.: Structure-regularized compressive tracking with online data-driven sampling. in IEEE Transactions on Image Processing, pp. 5692-5705 (2017)
[8] Zhang, T., Liu, S., Ahuja, N., Yang, M.H. and Ghanem, B.: Robust visual tracking via consistent low-rank sparse learning. International Journal of Computer Vision, 111(2) pp. 171–190 (2015)
[9] Wang, N., Shi, J., Yeung, D.Y., and Jia, J.: Understanding and diagnosing visual tracking systems, in International Conference on Computer Vision (ICCV) (2015)
[10] Kristan, M., Matas, J., Leonardis, A., et al.: The visual object tracking VOT2015 challenge results. in International Conference on Computer Vision Workshops (ICCVW) (2015)
[11] Wang, L., Ouyang, W., Wang, X., and Lu, H.: STCT: Sequentially training convolutional networks for visual tracking. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
[12] Ma, C., Huang, J.B., Yang, X., and Yang, M.H.: Hierarchical convolutional features for visual tracking. in International Conference on Computer Vision (ICCV) (2015)
[13] Hong, S., You, T., Kwak, S., and Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. in International Conference on Machine Learning (ICML) (2015)
[14] Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H.: Fully-convolutional Siamese networks for object tracking. in European Conference on Computer Vision (ECCV Workshops) (2016)
[15] Cui, Y., Jiang, C., Wang, L., and Wu, G.: MixFormer: End-to-end tracking with iterative mixed attention. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogn tion (CVPR) (2022)
[16] Guo, D., Wang, J., Cui, Y., Wang, Z., and Chen, S.: SiamCAR: Siamese fully convolutional classification and regression for visual tracking. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
[17] Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., and Hu, W.: Distractor-aware Siamese networks for visual object tracking. in European Conference on Computer Vision (2018)
[18] Nam, H., and Han, B.: Learning multi-domain convolutional neural networks for visual tracking. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
[19] Danelljan, M., Häger, G., Khan, F.S., and Felsberg, M.: Convolutional features for correlation filter based visual tracking. in International Conference on Computer Vision Workshops (ICCVW) (2015)
[20] Danelljan, M., Hager, G., Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. in: ICCV 2015 Workshop, pp. 58–66 (2015)
[21] Chen, K., and Tao, W.: Once for all: A two-flow convolutional neural network for visual tracking. in arxiv:1604.07507 (2016)
[22] Wang, G., Luo, C., Xiong, Z., and Zeng, W.: Spm-tracker: Series-parallel matching for real-time visual object tracking. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
[23] Danelljan, M., Bhat, G., Shahbaz Khan, F., and Felsberg, M.: ECO: Efficient convolution operators for tracking. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
[24] Wu, Y., Lim, J., and Yang, M.H.: Object tracking benchmark. in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 37(9), pp. 1834–1848 (2015)
[25] Zhu, H., Xue, M., et al.: Fast Visual Tracking with Siamese Oriented Region Proposal Network. in IEEE Signal Processing Letters, pp. 1437–1441 (2022)
[26] Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., and Maybank, S.: Learning attentions: Residual attentional Siamese network for high performance online visual tracking. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
[27] Wang, Q., Zhang, M., Xing, J., Gao, J., Hu, W., and Maybank, S.: Do not lose the details: Reinforced representation learning for high performance visual tracking. in International Joint Conference on Artificial Intelligence (IJCAI) (2018)
[28] Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. in Conference and Workshop on Neural Information Processing Systems (NIPS) (2012)
[29] He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
[30] Fu, Z., Liu, Q.,  et al.: Stmtrack: Template-free visual tracking with space-time memory networks. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13774–13783 (2021)
[31] Yan, B., Peng, H., Wu, K., Wang, D., Fu, J., Lu, H.: Lighttrack: Finding lightweight neural networks for object tracking via one-shot architecture search. in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15180–15189 (2021)
[32] Kristan, M., Leonardis, A., Matas, J., et al.: The visual object tracking VOT2016 challenge results. in European Conference on Computer Vision (ECCV) (2016)
[33] Kristan, M., Leonardis, A., et al.: The sixth visual object tracking VOT2018 challenge results. in European Conference on Computer Vision (ECCV), pp. 3–53 (2018)
[34] Huang, L., Zhao, X., and Huang, K.: Got-10k: A large high-diversity benchmark for generic object tracking in the wild. in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2019)
[35] Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. in Computer Vision and Pattern Recognition (CVPR) (2008)
[36] Fan, J., Xu, W., Wu, Y., Gong, Y.: Human tracking using convolutional neural networks. in IEEE Transactions on Neural Networks 21(10), pp.1610-1623 (2010)
[37] Bagherzadeh, M.A., Yazdi, M.: Fast object tracking with long-term occlusions handling in dynamic scenes. in International Conference on Robotics and Mechatronics (ICRoM) (2014)
[38] Bolme, D.S., Beveridge, J.R., Draper, B.A., and Lui, Y.M.: Visual object tracking using adaptive correlation filters. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
[39] Henriques, J.F., Caseiro, R., Martins, P., and Batista, J.: High-speed tracking with kernelized correlation filters. in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 37(3), pp. 583–596 (2015)
[40] Bagherzadeh, M.A., and Yazdi, M.: Regularized least-square object tracking based on ℓ2, 1 minimization. in IEEE International Conference on Robotics and Mechatronics (ICROM) (2015)
[41] Hu, W., Wang, Q., et al.: DCFNet: Discriminant correlation filters network for visual tracking. in Journal of Computer Science and Technology, Doi :10.1007/s11390-023-3788-3 (2023)
[42] Danelljan, M., Bhat, G., Khan, F.S., and Felsberg, M.: Atom: Accurate tracking by overlap maximization. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
[43] Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese Instance Search for Tracking. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
[44] Held, D., Thrun, S., and Savarese, S.: Learning to track at 100 fps with deep regression networks. in European Conference on Computer Vision (ECCV) (2016)
[45] Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X.: High performance visual tracking with Siamese region proposal network. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
[46] Lukezic, A., Matas, J., and Kristan, M.: D3S-a discriminative single shot segmentation tracker. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
[47] Voigtlaender, P., Luiten, J., Torr, P.H., and Leibe, B.: Siam R-CNN: Visual tracking by re-detection. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6578–6588 (2020)
[48] Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., and Yan, J.: SiamRPN++: Evolution of Siamese visual tracking with very deep networks. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4282–4291 (2019)
[49] Wang, Q., Zhang, L., Bertinetto L., et al.: Fast online object tracking and segmentation: A unifying approach. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
[50] Goutam, B., Danelljan, M et al.: Learning discriminative model prediction for tracking. in Proceedings of the IEEE International Conference on Computer Vision, pp. 6182–6191 (2019)
[51] Zhao, F., et al.: Siamese Regression Tracking with Reinforced Template Updating. in IEEE Transactions on Image Processing, pp. 628–640 (2020)
[52] Jiang, Y., Song, X., et al.: Target-Cognisant Siamese Network for Robust Visual Object Tracking. Pattern Recognition Letters, vol. 163, pp. 129-135 (2022)
[53] Tang, C., et al. Learning spatial-frequency transformer for visual object tracking. IEEE Transactions on Circuits and Systems for Video Technology (2023)
[54] Wang, N., Zhou, W., Wang, J., and Li, H.: Transformer meets tracker: Exploiting temporal context for robust visual tracking. in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 1571–1580 (2021)
[55] Cui, Y., Jiang, et al.: End-to-end tracking with iterative mixed attention. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13608–13618, (2022)
[56] Chen, X., Yan, B., Zhu, J., Wang, D., et al.: Transformer tracking. in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 8126–8135 (2021)
[57] Mayer, C., Danelljan, M., Paudel, D.P., and Gool, L.V.: Learning target candidate association to keep track of what not to track. in Proceedings of the IEEE International Conference on Computer Vision, (ICCV), pp. 13444–13454 (2021)
[58] Chen, Y., Wang, C.Y., Yang, C.Y., et al.: NeighborTrack: Improving Single Object Tracking by Bipartite Matching with Neighbor Tracklets. in arXiv:2211.06663, (2022)
[59] Hou, X., Lim, J., and Zhang, L.: Saliency detection: A spectral residual approach. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)
[60] Chen, P., et al. "Gridmask data augmentation." arXiv preprint arXiv:2001.04086 (2020)
[61] Kristan, M., Matas, J., Leonardis, A., et al.: The seventh visual object tracking vot2019 challenge results. in International Conference on Computer Vision Workshops (ICCVW) (2019)
[62] Kristan, M., Leonardis, A., et al.: The eighth visual object tracking VOT2020 challenge results. in European Conference on Computer Vision (ECCV) (2020)
[63] Javed, S., Danelljan, M., et al.: Object Tracking With Discriminative Filters and Siamese Networks: A Survey and Outlook. in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 6552-6574, doi: 10.1109/ TPAMI. 2022.3212594 (2023)
[64] Chen, Z., Zhong, B., Li, G., Zhang, S., and Ji, R.: Siamese box adaptive network for visual tracking. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)