Publication: Realization of a high performance bilingual OCR system for Thai-English printed documents
Issued Date
2010-11-29
Resource Type
Other identifier(s)
2-s2.0-78649304598
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2010. (2010)
Suggested Citation
Supachai Tangwongsan, Buntida Suvacharakulton Realization of a high performance bilingual OCR system for Thai-English printed documents. Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2010. (2010). doi:10.1109/NLPKE.2010.5587781 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/28996
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Realization of a high performance bilingual OCR system for Thai-English printed documents
Author(s)
Other Contributor(s)
Abstract
This paper presents a high performance bilingual OCR system for printed Thai and English text. With the complex nature of both Thai and English languages, the first stage is to identify languages within different zones by using geometric properties for differentiation. The second stage is the process of character recognition, in which the technique developed includes a feature extractor and a classifier. In the feature extraction, the thinned character image is analyzed and categorized into groups. Next, the classifier will take in two steps of recognition: the coarse level, followed by the fine level with a guide of decision trees. As to obtain an even better result, the final stage attempts to make use of dictionary look-up as to check for accuracy improvement in an overall performance. For verification, the system is tested by a series of experiments with printed documents in 141 pages and over 280,000 characters, the result shows that the system could obtain an accuracy of 100% in Thai monolingual, 98.18% in English monolingual, and 99.85% in bilingual documents on the average. In the final stage with a dictionary look-up, the system could yield a better accuracy of improvement up to 99.98% in bilingual documents as expected. ©2010 IEEE.