نویسه خوانی نوری (OCR) در خط‌های شکسته با استفاده از شبکه‌های تشخیص شیء

نوع مقاله : علمی-پژوهشی

نویسندگان

1 دانشگاه صنعتی جندی‌شاپور دزفول، دزفول، ایران.

2 دانشگاه صنعتی جندی‌شاپور دزفول، دزفول، ایران

چکیده

نویسه خوانی نوری (OCR) در خط‌های شکسته، که در آن حروف یک کلمه به هم چسبیده هستند و در جهت‌های افقی و عمودی با هم همپوشانی دارند، با چالش‌های زیادی در هنگام جداسازی نویسه‌های تشخیص داده نشده و تشخیص نویسه‌های جدا نشده روبه‌رو می‌شود. در این مقاله، ما استفاده از مدل‌های تشخیص شیء را برای تشخیص نویسه‌ها در خط‌های شکسته پیشنهاد می‌کنیم. سادگی اجرا و کارایی این روش در شناخت قلم‌های سبک دست‌نویس بررسی خواهد شد. در این پژوهش از شبکه یولو برای جداسازی و طبقه‌بندی نویسه‌های کلمات دلخواه سه حرفی در خط فارسی به عنوان مطالعه موردی استفاده شده است. در ابتدا مجموعه داده مناسب برای شبکه یولو را از قلم‌های فارسی با سبک دست‌نویس مانند مانلی و ایران‌نستعلیلق تولید کردیم. با استفاده از شبکه یولو به دقت بالای 98.5٪ در تشخیص نویسه‌های قلم مانلی و 97.6٪ برای ترکیب کلمات در قلم‌های مانلی و ایران‌نستعلیق دست یافتیم. سپس، آستانه دقت مدل پیشنهادی را با اضافه کردن نویز، تاری و چولگی به نمونه‌ها به چالش کشیدیم. علاوه بر این، ما از یک مدل پرسپترون چند لایه (MLP) برای پیش‌بینی کلمات از نویسه‌های شناسایی شده و مکان‌یابی شده توسط یولو با دقت بیش از 97.7٪ استفاده کردیم. این رویکرد ما را قادر می‌سازد تا بدون استفاده از لغت‌نامه فارسی، کلمات کامل با قلم‌های پیچیده به سبک دست‌نویس را به طور دقیق تشخیص دهیم.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Optical Character Recognition (OCR) in Cursive Scripts Using Object Detection Networks

نویسندگان [English]

  • Mojtaba Gandomkar 1
  • Sahar Khoramipour 2
1 Jundi-Shapur University of Technology, Dezful, Iran.
2 Jundi-Shapur University of Technology, Dezful, Iran.
چکیده [English]

Optical Character Recognition (OCR) in cursive scripts, where the letters of a word are joined in a flowing manner and overlap in both directions, deals with the struggles raised while segmentation of unrecognized characters and recognition of unseparated characters. In this paper, we propose using object detection models for character detection in cursive scripts. Simplicity of implementation and efficiency of this method in recognition of handwriting-style fonts are investigated and discussed. Here, YOLO model is used to separate and classify the characters of arbitrary three-letter words in Persian script as a case study. Initially, we generated synthetic datasets suitable for the YOLO network from handwriting-style Persian fonts, such as Maneli and IranNastaliq. By using the YOLO model, we achieved high Precision of 98.5% in character detection of Maneli font and 97.6% for a mixture of words in Maneli and IranNastaliq fonts, while the accuracy for the regular font Arial was almost 100%. Then, we challenged the proposed model by adding noise, blur, and skewness to the samples. Furthermore, we utilized a multi-layer perceptron (MLP) model to predict the words from the characters detected and localized by YOLO with the accuracy of 99.8% for Maneli font and 97.7% for a mixture of words in Maneli and IranNastaliq fonts, while the word detection accuracy for the regular font Arial was almost 100%. This approach enables us to recognize complete words accurately from complex handwriting-style fonts, without using a Persian vocabulary dictionary.

کلیدواژه‌ها [English]

  • Optical character recognition (OCR)
  • object detection
  • YOLO model
  • multi-layer perceptron (MLP)
  • Persian script
  • handwriting-style fonts
[1] M. Pandey, M. Arora, S. Arora, Ch. Goyal, V. K. Gera, and H. Yadav, “AI-based Integrated Approach for the Development of Intelligent Document Management System (IDMS)”, Procedia Computer Science, vol. 230, pp. 725-736, 2023.
[2] N. Girdhar, M. Coustaty, A. Doucet, “Digitizing History: Transitioning Historical Paper Documents to Digital Content for Information Retrieval and Mining—A Comprehensive Survey”, IEEE Transactions on Computational Social Systems, pp. 1-30, 2024.
[3] H.A. Alhamad, M. Shehab, M. K. Y. Shambour, M. A. Abu-Hashem, A. Abuthawabeh, H. Al-Aqrabi, M. Sh. Daoud, F. B. Shannaq, “Handwritten Recognition Techniques: A Comprehensive Review”, Symmetry, vol. 16, no. 6, p. 681, 2024.
[4] P. Shivakumara, U. Pal, “Cognitively Inspired Video Text Processing”, Springer Singapore, 2021.
[5] Z. Shen, R. Zhang, M. Dell, B. Charles, G. Lee, J. Carlson, W. Li, “Layoutparser: A unified toolkit for deep learning based document image analysis” In 16th International Conference on Document Analysis and Recognition (ICDAR), Lausanne, Switzerland, September 5–10, pp. 131-146, 2021.
[6] J. Memon, M. Sami, R. A. Khan, M. Uddin, “Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR)”, IEEE access, vol. 8, pp. 142642-142668, 2020.
[7] J. Park, E. Lee, Y. Kim, I. Kang, H.I. Koo, N.I. Cho, “Multi-lingual optical character recognition system using the reinforcement learning of character segmenter”, IEEE Access, vol. 8, pp. 174437-174448, 2020.
[8] Z. Khosrobeigi, H. Veisi, E. Hoseinzade, H. Shabanian, “Persian optical character recognition using deep bidirectional long short-term memory”, Applied Sciences, vol. 12, no. 22,  p. 11760, 2022.
[9] M. Bonyani, S. Jahangard, M. Daneshmand, “Persian handwritten digit, character and word recognition using deep learning”, International Journal on document analysis and recognition (IJDAR), vol. 24, no. 1, pp. 133-143, 2021.
[10] S. Ahmadi, M. Agarwal, A. Anastasopoulos, “PALI: A Language Identification Benchmark for Perso-Arabic Scripts”, In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial). 2023.
[11] R. Azmi, E. Kabir, “A new segmentation technique for omnifont Farsi text”, Pattern Recognition Letters, vol. 22, no. 2, pp. 97-104, 2001.
[12] H. Khosravi, E. Kabir, “A blackboard approach towards integrated Farsi OCR system”, International Journal of Document Analysis and Recognition (IJDAR), vol. 12, pp. 21-32, 2009.
[13] V. Hajihashemi, M. M. A. Ameri, A. A. Gharahbagh, A. Bastanfard, “A pattern recognition based Holographic Graph Neuron for Persian alphabet recognition”, In 2020 Int. conf. on machine vision and image processing (MVIP), pp. 1-6. IEEE, 2020.
[14] V. Ghods, M.K. Sohrabi, “Online Farsi Handwritten Character Recognition Using Hidden Markov Model”, Journal of Computers, vol. 11, no. 2, pp. 169-175, 2016.
[15] J. Sadri, M.R. Yeganehzad, J. Saghi, “A novel comprehensive database for offline Persian handwriting recognition”, Pattern Recognition, vol. 60, p. 378, 2016.
[16] S. Khorashadizadeh, A. Latif, “Arabic/Farsi Handwritten Digit Recognition usin Histogra of Oriented Gradient and Chain Code Histogram”, Int. Arab Journal of Information Technology (IAJIT), vol. 13, no. 4, 2016.
[17] M.J. Parseh, M. Meftahi, “A new combined feature extraction method for Persian handwritten digit recognition”, International Journal of Image and Graphics, vol. 17, no. 2, p. 1750012, 2017.
[18] G. A. Montazer, H. Q. Saremi, V. Khatibi, “A neuro-fuzzy inference engine for Farsi numeral characters recognition”, Expert Systems with Applications, vol. 37, no. 9, pp. 6327-6337, 2010.
[19] M. Pourreza, R. Derakhshan, H. Fayyazi, M. Sabokrou, “Sub-word based Persian OCR using auto-encoder features and cascade classifier”, In 2018 9th International Symposium on Telecommunications (IST), pp. 481-485. IEEE, 2018.
[20] Z.A. Aghbari, S. Brook, “HAH manuscripts: A holistic paradigm for classifying and retrieving historical Arabic handwritten documents”, Expert Systems with Applications, vol. 36, no. 8, pp. 10942-10951, 2009.
[21] Y. A. Nanehkaran, D. Zhang, S. Salimi, J. Chen, Y. Tian, N. Al-Nabhan, “Analysis and comparison of machine learning classifiers and deep neural networks techniques for recognition of Farsi handwritten digits”, The Journal of Supercomputing, vol. 77, pp. 3193-3222, 2021.
[22] M. Parseh, M. Rahmanimanesh, P. Keshavarzi, “Persian handwritten digit recognition using combination of convolutional neural network and support vector machine methods”, The International Arab Journal of Information Technology, vol.17, no. 4, pp. 572-578, 2020.
[23] H. Xiang, Q. Zou, M. A. Nawaz, X. Huang, F. Zhang, H. Yu, “Deep learning for image inpainting: A survey”, Pattern Recognition, vol. 134, pp. 109046, 2023.
[24] S. Zhang, X. Lu, Z. Lu, “Improved CNN-based CatBoost model for license plate remote sensing image classification”, Signal Processing, vol. 213, p. 109196, 2023.
[25] S. Khosravi, A. Chalechale, “Chimp optimization algorithm to optimize a convolutional neural network for recognizing Persian/Arabic handwritten words”, Mathematical Problems in Engineering, vol. 1, p. 4894922, 2022.
[26] U. Hengaju, B. K. Bal, “Improving the Recognition Accuracy of Tesseract-OCR Engine on Nepali Text Images via Preprocessing”, Advancement in Image Processing and Pattern Recognition, vol. 3, no. 2, 3, pp. 40-52, 2023.
[27] M. M. Misgar, F. Mushtaq, S. S. Khurana, M. Kumar, “Recognition of offline handwritten Urdu characters using RNN and LSTM models”, Multimedia Tools and Applications, vol. 82, no. 2, pp. 2053-2076, 2023.
[28] A. Mars, K. Dabbabi, S. Zrigui, M. Zrigui, “Combination of DE-GAN with CNN-LSTM for Arabic OCR on Images with Colorful Backgrounds”, In International Conference on Computational Collective Intelligence, pp. 585-596. Cham: Springer Nature Switzerland, 2023.
[29] M. F. Y. Ghadikolaie, E. Kabir, F. Razzazi, “Sub‐word based offline handwritten farsi word recognition using recurrent neural network”, ETRI Journal, vol. 38, no. 4, pp. 703-713, 2016.
[30] R. Najam, S. Faizullah, “Analysis of recent deep learning techniques for Arabic handwritten-text OCR and Post-OCR correction”, Applied Sciences, vol. 13, no. 13, p. 7568, 2023.
[31] N. Ghanmi, A. Belhakimi, A. Awal, “CNN-BLSTM Model for Arabic Text Recognition in Unconstrained Captured Identity Documents”, In International Conference on Image Analysis and Processing, pp. 106-118. Cham: Springer Nature Switzerland, 2023.
[32] A. A. Pratama, M. D. Sulistiyo, A. F. Ihsan, “Balinese Script Handwriting Recognition Using Faster R-CNN”, Journal of RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 6, pp. 1268-1275, 2023.
[33] R. Mondal, S. Malakar, B. E.H. Smith, R. Sarkar, “Handwritten English word recognition using a deep learning based object detection architecture”, Multimedia Tools and Applications, vol. 81, pp. 975–1000, 2022.
[34] S. Alghyaline, “A Printed Arabic Optical Character Recognition System using Deep Learning”, Journal of Computer Science, vol. 18, no. 11, pp. 1038-1050, 2022.
[35] A. A. Demir, U. Ozkaya, “Ottoman character recognition on printed documents using deep learning”, Mühendislik Bilimleri ve Tasarım Dergisi, vol. 12, no. 2, pp. 392-402, 2024.
[36] X. Wang, S. Zheng, C. Zhang, R. Li, L. Gui, “R-YOLO: A real-time text detector for natural scenes with arbitrary rotation”, Sensors, vol. 21, no. 3, p. 888, 2021.
[37] D. Etter, S. Rawls, C. Carpenter, G. Sell, “A synthetic recipe for OCR”, In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 864-869. IEEE, 2019.
[38] S. Hatami, S. Behnam, R. Shamsaee, “Improving detection of capsule endoscopy using YOLO”, Tabriz journal of electrical engineering, 2024, (In Persian), doi: 10.22034/tjee.2024.58239.4711.
[39] E. Zafarani-Moattar, M. R. Feizi-Derakhshi, A. Roohany, “The intelligent and automatic detection of type errors in large databases without using dictionary”, Tabriz journal of electrical engineering, vol. 47, no. 1, pp. 81-91, 2017, (In Persian)