Publication: Modeling broken characters recognition as a set-partitioning problem
Issued Date
2012-12-01
Resource Type
ISSN
01678655
Other identifier(s)
2-s2.0-84866866137
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
Pattern Recognition Letters. Vol.33, No.16 (2012), 2270-2279
Suggested Citation
Chaivatna Sumetphong, Supachai Tangwongsan Modeling broken characters recognition as a set-partitioning problem. Pattern Recognition Letters. Vol.33, No.16 (2012), 2270-2279. doi:10.1016/j.patrec.2012.08.021 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/14001
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Modeling broken characters recognition as a set-partitioning problem
Author(s)
Other Contributor(s)
Abstract
This paper presents a novel technique for recognizing broken characters found in degraded text documents by modeling it as a set-partitioning problem (SPP). The proposed technique searches for the optimal set-partition of the connected components by which each subset yields a reconstructed character. Given the non-linear nature of the objective function needed for optimal set-partitioning, we design an algorithm that we call Heuristic Incremental Integer Programming (HIIP). The algorithm employs integer programming (IP) with an incremental approach using heuristics to hasten the convergence. The objective function is formulated as probability functions that reflect common OCR measurements - pattern resemblance, sizing conformity and distance between connected components. We applied the HIIP technique to Thai and English degraded text documents and achieved accuracy rates over 90%. We also compared HIIP against three competing algorithms and achieved higher comparative accuracy in each case. © 2012 Elsevier B.V. All rights reserved.
