Publication: An optimal approach towards recognizing broken Thai characters in OCR systems
Issued Date
2012-12-01
Resource Type
Other identifier(s)
2-s2.0-84874352445
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
2012 International Conference on Digital Image Computing Techniques and Applications, DICTA 2012. (2012)
Suggested Citation
Chaivatna Sumetphong, Supachai Tangwongsan An optimal approach towards recognizing broken Thai characters in OCR systems. 2012 International Conference on Digital Image Computing Techniques and Applications, DICTA 2012. (2012). doi:10.1109/DICTA.2012.6411736 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/14005
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
An optimal approach towards recognizing broken Thai characters in OCR systems
Author(s)
Other Contributor(s)
Abstract
This paper presents a novel technique for recognizing broken Thai characters found in degraded Thai text documents by modeling it as a set-partitioning problem (SPP). The technique searches for the optimal set-partition of the connected components by which each subset yields a reconstructed Thai character. Given the non-linear nature of the objective function needed for optimal set-partitioning, we design an algorithm we call Heuristic Incremental Integer Programming (HIIP), that employs integer programming (IP) with an incremental approach using heuristics to hasten the convergence. To generate corrected Thai words, we adopt a probabilistic generative approach based a Thai dictionary corpus. The proposed technique is applied successfully to a Thai historical document and poor quality Thai fax document with promising accuracy rates over 93%. © 2012 IEEE.