Publication:
Classification of Tweets Related to Illegal Activities in Thai Language

dc.contributor.authorSumeth Yuenyongen_US
dc.contributor.authorNarit Hnoohomen_US
dc.contributor.authorKonlakorn Wongpatikasereeen_US
dc.contributor.authorTeerapong Pheungbun Na Ayutthayaen_US
dc.contributor.otherMahidol Universityen_US
dc.date.accessioned2019-08-23T10:56:15Z
dc.date.available2019-08-23T10:56:15Z
dc.date.issued2018-07-02en_US
dc.description.abstract© 2018 IEEE. This paper presents classification of tweets related to illegal activities in Thai language. The unfiltered nature of Twitter allows it to be used as platform for communication about illegal activities. The sheer number of tweets makes an automatic tweet classification needed to detect these illegal tweets. Very little had been done about this issue, especially in the Thai language. Tweets classification is more difficult that standard text classification due to their short length colloquial nature. Furthermore, the training data is imbalanced because legal tweets are very easy to find while illegal tweets of specific types are quite hard to come by. We propose a tree-like hierarchical model where each node is a full deep neural network based on convolutional LSTM architecture. In order to deal with highly imbalanced training data, tweets were classified in two stages: legal/illegal first before being classified among the illegal classes. Furthermore, ensemble classifiers were used to detect difficult illegal classes that were misclassified as legal by the first stage. Experiment result shows that this approach has significantly better performance than the baseline of using only a single network to classify among all classes in a single stage.en_US
dc.identifier.citation2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2018 - Proceedings. (2018)en_US
dc.identifier.doi10.1109/iSAI-NLP.2018.8692858en_US
dc.identifier.other2-s2.0-85065081604en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/45615
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85065081604&origin=inwarden_US
dc.subjectComputer Scienceen_US
dc.subjectMedicineen_US
dc.titleClassification of Tweets Related to Illegal Activities in Thai Languageen_US
dc.typeConference Paperen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85065081604&origin=inwarden_US

Files

Collections