Publication:
Khmer POS Tagging Using Conditional Random Fields

dc.contributor.authorSokunsatya Sangvaten_US
dc.contributor.authorCharnyote Pluempitiwiriyawejen_US
dc.contributor.otherMahidol Universityen_US
dc.date.accessioned2019-08-23T10:58:35Z
dc.date.available2019-08-23T10:58:35Z
dc.date.issued2018-01-01en_US
dc.description.abstract© 2018, Springer Nature Singapore Pte Ltd. The transformation-based approach with hybrid of rule-based and tri-gram have already been introduced for Khmer part-of-speech (POS) tagging. In this study, in order to further explore this topic, we present an alternative approach to Khmer POS tagging using Conditional Random Fields (CRFs). Since the features greatly affect the tagging accuracy, we investigate five groups of features and use them with the CRF model. First, we study different contextual information and use it as our baseline model. We then analyze the characteristics of Khmer and come up with three additional groups of language-related features including morphemes, word-shapes and name-entities. We also explore the use of lexicon as features to further improve the accuracy of our tagger. Our proposed approach has been evaluated on a corpus of 41,058 words and 27 POS tags. The comparative study has shown that our proposed approach produces a competitive accuracy compared to other Khmer POS tagging approaches.en_US
dc.identifier.citationCommunications in Computer and Information Science. Vol.781, (2018), 169-178en_US
dc.identifier.doi10.1007/978-981-10-8438-6_14en_US
dc.identifier.issn18650929en_US
dc.identifier.other2-s2.0-85044073164en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/45669
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85044073164&origin=inwarden_US
dc.subjectComputer Scienceen_US
dc.subjectMathematicsen_US
dc.titleKhmer POS Tagging Using Conditional Random Fieldsen_US
dc.typeConference Paperen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85044073164&origin=inwarden_US

Files

Collections