Publication: Classification of Tweets Related to Illegal Activities in Thai Language
Issued Date
2018-07-02
Resource Type
Other identifier(s)
2-s2.0-85065081604
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2018 - Proceedings. (2018)
Suggested Citation
Sumeth Yuenyong, Narit Hnoohom, Konlakorn Wongpatikaseree, Teerapong Pheungbun Na Ayutthaya Classification of Tweets Related to Illegal Activities in Thai Language. 2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2018 - Proceedings. (2018). doi:10.1109/iSAI-NLP.2018.8692858 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/45615
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Classification of Tweets Related to Illegal Activities in Thai Language
Other Contributor(s)
Abstract
© 2018 IEEE. This paper presents classification of tweets related to illegal activities in Thai language. The unfiltered nature of Twitter allows it to be used as platform for communication about illegal activities. The sheer number of tweets makes an automatic tweet classification needed to detect these illegal tweets. Very little had been done about this issue, especially in the Thai language. Tweets classification is more difficult that standard text classification due to their short length colloquial nature. Furthermore, the training data is imbalanced because legal tweets are very easy to find while illegal tweets of specific types are quite hard to come by. We propose a tree-like hierarchical model where each node is a full deep neural network based on convolutional LSTM architecture. In order to deal with highly imbalanced training data, tweets were classified in two stages: legal/illegal first before being classified among the illegal classes. Furthermore, ensemble classifiers were used to detect difficult illegal classes that were misclassified as legal by the first stage. Experiment result shows that this approach has significantly better performance than the baseline of using only a single network to classify among all classes in a single stage.