Publication:
A highly effective approach for document page layout extraction system

dc.contributor.authorSupachai Tangwongsanen_US
dc.contributor.authorCholticha Boondirekeen_US
dc.contributor.otherMahidol Universityen_US
dc.date.accessioned2018-10-19T04:50:25Z
dc.date.available2018-10-19T04:50:25Z
dc.date.issued2013-12-01en_US
dc.description.abstractIn this paper, we propose a highly effective scheme for document page layout extraction system as a part of character recognition processes. There are 3 stages in the working model, namely document segmentation, document layout classification and document reading order determination. In the first stage, a hybrid document segmentation decomposes a page of the document image into a variety of blocks by using the combination of diagonal white runs and vertical edges segmentation, together with modified histogram projection. Next, the features related to geometric layout in the page are extracted by using the feature analysis, combined with the technique of rule-based approach for classifying those block types and attributes. In the third stage, a highly efficient algorithm is introduced for block order sequencing search (BOSS) as to determine the right reading sequences of blocks in the page. The model is then tested on a large number of samples of those bilingual documents with Thai and English languages, and with different geometric patterns, multiple columns, rows, fonts and sizes. The results show quite a promising one with accuracy rate of 99.47%, and the speed of 2.887 seconds per page on the average in the experiment. © 2013 IEEE.en_US
dc.identifier.citation2013 10th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2013. (2013), 85-90en_US
dc.identifier.doi10.1109/ICCWAMTIP.2013.6716605en_US
dc.identifier.other2-s2.0-84894207989en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/31586
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84894207989&origin=inwarden_US
dc.subjectComputer Scienceen_US
dc.titleA highly effective approach for document page layout extraction systemen_US
dc.typeConference Paperen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84894207989&origin=inwarden_US

Files

Collections