Publication:
Co-occurrence-based error correction approach to word segmentation

dc.contributor.authorEkawat Chaowicharaten_US
dc.contributor.authorKanlaya Naruedomkulen_US
dc.contributor.otherMahidol Universityen_US
dc.date.accessioned2018-05-03T08:08:29Z
dc.date.available2018-05-03T08:08:29Z
dc.date.issued2011-12-01en_US
dc.description.abstractA number of word segmentation algorithms have been offered in the past; however, there is still room for improvement. Co-occurrence-Based Error Correction (CBEC), the proposed approach in this chapter, is a novel Thai word segmentation approach that was designed to provide accurate segmentation results based on context and purpose. CBEC quickly segments the input string using any available algorithm; maximal matching was used in the experiment. Next, CBEC checks its segmentation output against an error risk data bank to determine if there is any error risk. The error risk data bank is developed based on a training corpus. The current version of the error risk bank was based on the training corpus available at BEST 2009. Then, CBEC re-segments the input string using the co-occurrence score of the word sequence to ensure the accuracy of the segmentation result. © 2012, IGI Global.en_US
dc.identifier.citationCross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches. (2011), 354-364en_US
dc.identifier.doi10.4018/978-1-61350-447-5.ch023en_US
dc.identifier.other2-s2.0-84898589146en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/11746
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84898589146&origin=inwarden_US
dc.subjectComputer Scienceen_US
dc.titleCo-occurrence-based error correction approach to word segmentationen_US
dc.typeChapteren_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84898589146&origin=inwarden_US

Files

Collections