بهبود تشخیص تصاویر اندوسکوپی کپسولی با استفاده از شبکه عصبی YOLO

نوع مقاله : علمی-پژوهشی

نویسندگان

1 دانشکده مهندسی کامپیوتر دانشگاه سجاد- مشهد-ایران

2 دانشکده مهندسی کامپیوتر، دانشگاه سجاد- مشهد- ایران

3 عضو هیات علمی / دانشگاه سجاد

چکیده

فناوری اندوسکوپی کپسولی (CE) توسعه سریعی را تجربه می کند. این پیشرفت وابسته به سهولت استفاده، طول عمر بالای باتری، و کیفیت خوب تصاویر است. اگرچه وضوح دنباله‌های تصاویر این تکنیک درحال رشد است، شناسایی محتوای مورد علاقه در آن نیازمند صرف زمان و تلاش زیادی است. برای این مورد، روشی جدید در این مقاله ارایه شده است که مبتنی بر معماری شبکه عصبی متداول (YOLO v5) بوده و توسط آن مکان و برچسب توده‌ها برروی دو پایگاه دادگان قابل دسترس عموم مورد آزمایش قرار گرفته است. شبکه عصبی دیگری به نام (GPD) که براساس معماری (ALexNet) می باشد به عنوان رقیب انتخاب شده است. هدف اصلی از این تحقیق کاهش زمان تشخیص با حفظ دقت موجود توسط (Yolo) بوده است. خوشبختانه نتایج 6% هم، در صحت تشخیص نسبت به رقیب رشد داشته. بعلاوه، (Yolo ) 58% کارایی بهتر در متوسط زمان پیشبینی از خود نمایش می‌هد و هر فریم در 5.39 میلی ثانیه مورد تحلیل قرارمی‌گیرد. همچنین، مقیاس پذیری (Yolo) مورد بررسی قرارگرفته است، که نتایج اشاره به تنزل مطبوع کیفیت، به اندازه 6.95 مرتبه برروی دادگان (Kvasir) دارد، که اثبات بر کاربردی بودن (Yolo) در این حوزه است. افزایش کیفیت ورودی منجر به نتایج بهتر در (Yolo) شده است. تمامی پیاده سازی‌ها ومطالب پیرامونی برروی سایت (GitHub) قابل دسترس است.

کلیدواژه‌ها


عنوان مقاله [English]

Improving Detection of Capsule Endoscopy Using YOLO

نویسندگان [English]

  • Shokoufeh Hatami 1
  • Sina Behnam 2
  • Reza Shamsaee 3
1 Faculty of Computer Engineering and Information Technology, Sadjad University, Mashhad, Iran
2 Faculty of Computer Engineering and Information Technology, Sadjad University, Mashhad, Iran
3 Sadjad University /member of computer faculty
چکیده [English]

Capsule endoscopy (CE) technology is rapidly advancing due to its easy usability, long battery life, and exceptional image quality. However, the increasing clarity of image sequences captured by CE requires more time and effort to detect desired content. To address this issue, a new approach is presented in this paper using the popular YOLO v5 neural network architecture to detect the location and label of lesions in two public CE contents. A GPD neural network based on AlexNet is used as a rival classifier. The primary goal of this research is to reduce diagnostic time while maintaining accuracy using YOLO, and the results show a 6% increase in detection accuracy over the rival. Additionally, YOLO is 58% more time-efficient with an average prediction time of 5.39 milliseconds per frame. The scalability of YOLO is also analyzed, and results indicate a 6.95 times graceful degradation over Kvasir, proving YOLO's real-time applicability. Higher resolution inputs lead to better results with YOLO. Implementations and supplementary data are available on GitHub.

کلیدواژه‌ها [English]

  • Gastroenterology
  • Capsule endoscopy
  • YOLO
  • GPD
[1]        C. Hamashima, “Update version of the Japanese Guidelines for Gastric Cancer Screening,” Jpn J Clin Oncol., vol. 48, no. 7, pp. 673-683, 2018.
[2]        S. Hatami, R. Shamsaee and M. Olyaei, “Detection and classification of gastric precancerous diseases using deep learning,” in IEEE 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS) (pp. 1-5), 2020.
[3]        S. Piccirelli, A. Mussetto, A. Bellumat, R. Cannizzaro, M. Pennazio, A. Pezzoli, A. Bizzotto, N. Fusetti, F. Valiante, C. Hassan, S. Pecere, A. Koulaouzidis and C. Spada, “New Generation Express View: An Artificial Intelligence Software Effectively Reduces Capsule Endoscopy Reading Times,” Diagnostics, vol. 12, no. 8, p. 1783, 2022.
[4]        M. Alom, T. Taha, C. Yakopcic, S. Westberg, P. Sidike, M. Nasrin, B. Van Esesn, A. Awwal and V. Asari, “The history began from alexnet: A comprehensive survey on deep learning approaches,” arXiv preprint, p. arXiv:1803.01164, 2018.
[5]        F. Iandola, S. Han, M. Moskewicz, K. Ashraf, W. Dally and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size,” arXiv preprint, p. arXiv:1602.07360, 2016.
[6]        S. Targ, D. Almeida and K. Lyman, “Resnet in resnet: Generalizing residual architectures,” arXiv preprint, p. arXiv:1603.08029, 2016.
[7]        P. Jiang, D. Ergu, F. Liu, Y. Cai and B. Ma, “A Review of Yolo algorithm developments,” Procedia Computer Science, vol. 199, pp. 1066-1073, 2022.
[8]        J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” in IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6517-6525), 2017.
[9]        sina-behnam. [Online]. Available: https://github.com/sina-behnam/GPD_Classify.
[10]      J. Saurin, M. Lapalus, F. Cholet, P. D'Halluin, B. Filoche, M. Gaudric, S. Sacher-Huvelin, C. Savalle, M. Frederic, P. Lamarre and E. Ben Soussan, “Can we shorten the small-bowel capsule reading time with the "Quick-view" image detection system?,” Dig Liver Dis., vol. 44, no. 6, pp. 477-481, 2012.
[11]      S. Beg, E. Wronska, I. Araujo, B. González Suárez, E. Ivanova, E. Fedorov, L. Aabakken, U. Seitz, J. Rey, J. Saurin, R. Tari, T. Card and K. Ragunath, “Use of rapid reading software to reduce capsule endoscopy reading times while maintaining accuracy,” Gastrointest Endosc., vol. 91, no. 6, pp. 1322-1327, 2020.
[12]      C. Gomes, R. Pinho, A. Ponte, A. Rodrigues, M. Sousa, J. Silva, E. Afecto and J. Carvalho, “Evaluation of the sensitivity of the Express View function in the Mirocam® capsule endoscopy software,” Scand J Gastroenterol, vol. 55, no. 3, pp. 371-375, 2020.
[13]      P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR, 2001.
[14]      N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005.
[15]      P. Felzenszwalb, R. B. Girshick, D. McAllester and D. Ramanan, “Object Detection with Discriminatively Trained Part-Based Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627-1645, 2010.
[16]      R. Girshick, J. Donahue, T. Darrell and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in IEEE conference on computer vision and pattern recognition (pp. 580-587), 2014.
[17]      K. He, X. Zhang, S. Ren and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” arXiv preprint, p. arXiv:1406.4729, 2014.
[18]      R. Girshick, “Fast R-CNN,” arXiv preprint, p. arXiv:1504.08083, 2015.
[19]      S. Ren, K. He, R. Girshick and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv preprint, p. arXiv:1506.01497, 2016.
[20]      J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” arXiv preprint, p. arXiv:1506.02640, 2016.
[21]      W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu and A. Berg, “Ssd: Single shot multibox detector,” in Springer European conference on computer vision (pp. 21-37), 2016.
[22]      T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie, “Feature Pyramid Networks for Object Detection,” arXiv preprint, p. arXiv:1612.03144, 2016.
[23]      T. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, “Focal loss for dense object detection,” in IEEE international conference on computer vision (pp. 2980-, 2017.
[24]      C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, “Going deeper with convolutions,” in IEEE conference on computer vision and pattern recognition (pp. 1-9), 2015.
[25]      L. Alzubaidi, J. Zhang, A. Humaidi, A. Al-Dujaili, Y. Duan, O. Al-Shamma, J. Santamaría, M. Fadhel, M. Al-Amidie and L. Farhan, “Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions,” Journal of big Data, vol. 8, no. 1, pp. 1-74, 2021.
[26]      X. Zhang, F. Chen, T. Yu, J. An, Z. Huang, J. Liu, W. Hu, L. Wang, H. Duan and J. Si, “Real-time gastric polyp detection using convolutional neural networks,” PloS one, vol. 14, no. 3, p. p.e0214133, 2019.
[27]      C. Bailer, T. Habtegebrial and D. Stricker, “Fast feature extraction with CNNs with pooling layers,” arXiv preprint, p. arXiv:1805.03096, 2018.
[28]      Q. Zhang and D. Liang, “Visualization of fully connected layer weights in deep learning CT reconstruction,” arXiv preprint, p. arXiv:2002.06788, 2020.
[29]      A. Akbari, M. Awais, M. Bashar and J. Kittler, “A Theoretical Insight Into the Effect of Loss Function for Deep Semantic-Preserving Learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 1, pp. 119-133, 2023.
[30]      Z. Zou, Z. Shi, Y. Guo and J. Ye, “Object detection in 20 years: A survey,” arXiv preprint, p. arXiv:1905.05055, 2019.
مونا زنده دل, جواد حمید زاده “بهبود تشخیص نفوذ به شبکه اینترنت اشیاء با استفاده از یادگیری عمیق و الگوریتم بهینه سازی میگوی آشوبی,”  مجله مهندسی برق دانشگاه تبریز، جلد 53، شماره 2، صفحات 127-138، 1402.
            [31]
[32]      A. Bochkovskiy, C. Wang and H. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv preprint, p. arXiv:2004.10934, 2020.
[33]      X. Zhang, W. Hu, F. Chen, J. Liu, Y. Yang, L. Wang, H. Duan and J. Si, “Gastric precancerous diseases classification using CNN with a concise model,” PLoS One, vol. 12, no. 9, p. e0185508, 2017.
[34]      M. VASOUJOUYBARI, E. Ataie and M. Bastam, “An MLP-based Deep Learning Approach for Detecting DDoS Attacks,” Tabriz Journal of Electrical Engineering (TJEE), vol. 52, pp. 195-204, 2022.
[35]      J. Qi, J. Du, S. Siniscalchi, X. Ma and C. Lee, “On Mean Absolute Error for Deep Neural Network Based,” arXiv, p. arXiv:2008.07281, 2020.
[36]      J. Chen, C. Wolfe, Z. Li and A. Kyrillidis, “Demon: Improved Neural Network Training with Momentum Decay,” arXiv, p. arXiv:1910.04952, 2021.
[37]      Kvasir. [Online]. Available: https://datasets.simula.no/downloads/kvasir/kvasir-dataset-v2.zip.
[38]      Scikit-learn. [Online]. Available: https://scikit-learn.org/stable/.