Publication:
Effectively recognizing broken characters in Historical documents

dc.contributor.authorChaivatna Sumetphongen_US
dc.contributor.authorSupachai Tangwongsanen_US
dc.contributor.otherMahidol Universityen_US
dc.date.accessioned2018-06-11T04:45:12Z
dc.date.available2018-06-11T04:45:12Z
dc.date.issued2012-10-09en_US
dc.description.abstractHistorical documents, after being binarized, produce images that contain abundant broken pieces. The presence of these broken pieces naturally complicates the process of OCR and drastically drops the overall recognition accuracy. We propose a highly effective approach to recognize the broken characters using a heuristic enumerative method to find the optimal set partition of the broken pieces. Each subset of the optimal partition is mapped to the best character pattern and the overall image is recognized. Results obtained after performing experiments on a Thai Historical document and an American Historical document are quite promising. Given the generality of the method, it may be applicable to different language scripts given that a properly trained classifier has been developed for that script and font. © 2012 IEEE.en_US
dc.identifier.citationCSAE 2012 - Proceedings, 2012 IEEE International Conference on Computer Science and Automation Engineering. Vol.3, (2012), 104-108en_US
dc.identifier.doi10.1109/CSAE.2012.6272918en_US
dc.identifier.other2-s2.0-84867080115en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/14031
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84867080115&origin=inwarden_US
dc.subjectComputer Scienceen_US
dc.titleEffectively recognizing broken characters in Historical documentsen_US
dc.typeConference Paperen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84867080115&origin=inwarden_US

Files

Collections