A more recent empirical study provides a comparative evaluation of the most common Arabic OCR systems ( Saber et al., 2016). However, as these studies were conducted in 1999, over 17 years ago, they do not reflect current progress in the field of Arabic OCR development ( Alginahi, 2013). For instance, two studies provide an evaluation of two Arabic OCR systems ( Kanungo et al., 1999a, 1999b). Furthermore, evaluating OCR performance contributes to monitoring progress in OCR system development, analysing the effectiveness of OCR systems, identifying open areas and providing a scientific explanation for the performance of OCR systems ( Kanungo et al., 1999a Mihov et al., 2005).ĭespite the significance of Arabic OCR system performance evaluation, relatively little work has been published on empirical analysis of the effectiveness of Arabic OCR systems. To produce an efficient Arabic OCR system, effective performance evaluation of current OCR systems is essential. Therefore, research is being undertaken seeking more solutions for Arabic OCR systems ( Parvez and Mahmoud, 2013 Slimane et al., 2013). Moreover, a character might have up to four different shapes in relation to its position in a word. Arabic script is written cursively through a baseline and contains loop-shaped characters, zigzag-shaped characters, dot characters and diacritics. Figure 1 illustrates the characteristics of Arabic printed text that contribute to the inadequate development of Arabic script recognition. Therefore, this study will deal only with Arabic printed text. Although handwritten script is significantly more challenging than printed Arabic text for OCR, Arabic printed text OCR still poses significant challenges ( Alghamdi et al., 2016). Previous research on text recognition has focused primarily on Latin scripts, such as English and Chinese, and it has not been until the last two decades that recognition of non-Latin scripts, such as Arabic, have been researched ( Alginahi, 2013). In each stage, specific techniques are applied for more details, see Khorsheed (2002).
Generally, the process for developing OCR systems involves five stages: pre-processing, segmentation, feature extraction, classification and post-processing. This technique is highly desirable in various real-world applications, such as digitising learning resources to assist visually impaired people, bank cheque processing and mail sorting ( Alginahi, 2013 Al-Badr and Mahmoud, 1995). Optical character recognition (OCR) is a technique that aims to automatically convert a machine-printed or handwritten text image into an editable text format ( Alghamdi et al., 2016).
The full terms of this licence may be seen at Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Published in PSU Research Review: An International Journal.
Copyright © 2017, Mansoor Alghamdi and William Teahan License