A highly effective approach for document page layout extraction system

Supachai Tangwongsan; Cholticha Boondireke

Publication:
A highly effective approach for document page layout extraction system

dc.contributor.author	Supachai Tangwongsan	en_US
dc.contributor.author	Cholticha Boondireke	en_US
dc.contributor.other	Mahidol University	en_US
dc.date.accessioned	2018-10-19T04:50:25Z
dc.date.available	2018-10-19T04:50:25Z
dc.date.issued	2013-12-01	en_US
dc.description.abstract	In this paper, we propose a highly effective scheme for document page layout extraction system as a part of character recognition processes. There are 3 stages in the working model, namely document segmentation, document layout classification and document reading order determination. In the first stage, a hybrid document segmentation decomposes a page of the document image into a variety of blocks by using the combination of diagonal white runs and vertical edges segmentation, together with modified histogram projection. Next, the features related to geometric layout in the page are extracted by using the feature analysis, combined with the technique of rule-based approach for classifying those block types and attributes. In the third stage, a highly efficient algorithm is introduced for block order sequencing search (BOSS) as to determine the right reading sequences of blocks in the page. The model is then tested on a large number of samples of those bilingual documents with Thai and English languages, and with different geometric patterns, multiple columns, rows, fonts and sizes. The results show quite a promising one with accuracy rate of 99.47%, and the speed of 2.887 seconds per page on the average in the experiment. © 2013 IEEE.	en_US
dc.identifier.citation	2013 10th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2013. (2013), 85-90	en_US
dc.identifier.doi	10.1109/ICCWAMTIP.2013.6716605	en_US
dc.identifier.other	2-s2.0-84894207989	en_US
dc.identifier.uri	https://repository.li.mahidol.ac.th/handle/123456789/31586
dc.rights	Mahidol University	en_US
dc.rights.holder	SCOPUS	en_US
dc.source.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84894207989&origin=inward	en_US
dc.subject	Computer Science	en_US
dc.title	A highly effective approach for document page layout extraction system	en_US
dc.type	Conference Paper	en_US
dspace.entity.type	Publication
mu.datasource.scopus	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84894207989&origin=inward	en_US

Collections

Scopus 2011-2015

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th

Publication: A highly effective approach for document page layout extraction system

Files

Collections

Publication:
A highly effective approach for document page layout extraction system