Effectively recognizing broken characters in Historical documents

Chaivatna Sumetphong; Supachai Tangwongsan

Publication:
Effectively recognizing broken characters in Historical documents

dc.contributor.author	Chaivatna Sumetphong	en_US
dc.contributor.author	Supachai Tangwongsan	en_US
dc.contributor.other	Mahidol University	en_US
dc.date.accessioned	2018-06-11T04:45:12Z
dc.date.available	2018-06-11T04:45:12Z
dc.date.issued	2012-10-09	en_US
dc.description.abstract	Historical documents, after being binarized, produce images that contain abundant broken pieces. The presence of these broken pieces naturally complicates the process of OCR and drastically drops the overall recognition accuracy. We propose a highly effective approach to recognize the broken characters using a heuristic enumerative method to find the optimal set partition of the broken pieces. Each subset of the optimal partition is mapped to the best character pattern and the overall image is recognized. Results obtained after performing experiments on a Thai Historical document and an American Historical document are quite promising. Given the generality of the method, it may be applicable to different language scripts given that a properly trained classifier has been developed for that script and font. © 2012 IEEE.	en_US
dc.identifier.citation	CSAE 2012 - Proceedings, 2012 IEEE International Conference on Computer Science and Automation Engineering. Vol.3, (2012), 104-108	en_US
dc.identifier.doi	10.1109/CSAE.2012.6272918	en_US
dc.identifier.other	2-s2.0-84867080115	en_US
dc.identifier.uri	https://repository.li.mahidol.ac.th/handle/123456789/14031
dc.rights	Mahidol University	en_US
dc.rights.holder	SCOPUS	en_US
dc.source.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84867080115&origin=inward	en_US
dc.subject	Computer Science	en_US
dc.title	Effectively recognizing broken characters in Historical documents	en_US
dc.type	Conference Paper	en_US
dspace.entity.type	Publication
mu.datasource.scopus	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84867080115&origin=inward	en_US

Collections

Scopus 2011-2015

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th

Publication: Effectively recognizing broken characters in Historical documents

Files

Collections

Publication:
Effectively recognizing broken characters in Historical documents