Publication:
Realization of a high performance bilingual OCR system for Thai-English printed documents

dc.contributor.authorSupachai Tangwongsanen_US
dc.contributor.authorBuntida Suvacharakultonen_US
dc.contributor.otherMahidol Universityen_US
dc.date.accessioned2018-09-24T08:56:57Z
dc.date.available2018-09-24T08:56:57Z
dc.date.issued2010-11-29en_US
dc.description.abstractThis paper presents a high performance bilingual OCR system for printed Thai and English text. With the complex nature of both Thai and English languages, the first stage is to identify languages within different zones by using geometric properties for differentiation. The second stage is the process of character recognition, in which the technique developed includes a feature extractor and a classifier. In the feature extraction, the thinned character image is analyzed and categorized into groups. Next, the classifier will take in two steps of recognition: the coarse level, followed by the fine level with a guide of decision trees. As to obtain an even better result, the final stage attempts to make use of dictionary look-up as to check for accuracy improvement in an overall performance. For verification, the system is tested by a series of experiments with printed documents in 141 pages and over 280,000 characters, the result shows that the system could obtain an accuracy of 100% in Thai monolingual, 98.18% in English monolingual, and 99.85% in bilingual documents on the average. In the final stage with a dictionary look-up, the system could yield a better accuracy of improvement up to 99.98% in bilingual documents as expected. ©2010 IEEE.en_US
dc.identifier.citationProceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2010. (2010)en_US
dc.identifier.doi10.1109/NLPKE.2010.5587781en_US
dc.identifier.other2-s2.0-78649304598en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/28996
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=78649304598&origin=inwarden_US
dc.subjectComputer Scienceen_US
dc.titleRealization of a high performance bilingual OCR system for Thai-English printed documentsen_US
dc.typeConference Paperen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=78649304598&origin=inwarden_US

Files

Collections