Classification of Tweets Related to Illegal Activities in Thai Language

Sumeth YuenyongNarit HnoohomKonlakorn WongpatikasereeTeerapong Pheungbun Na AyutthayaMahidol University2019-08-232019-08-232018-07-022018 International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2018 - Proceedings. (2018)2-s2.0-85065081604https://repository.li.mahidol.ac.th/handle/20.500.14594/45615© 2018 IEEE. This paper presents classification of tweets related to illegal activities in Thai language. The unfiltered nature of Twitter allows it to be used as platform for communication about illegal activities. The sheer number of tweets makes an automatic tweet classification needed to detect these illegal tweets. Very little had been done about this issue, especially in the Thai language. Tweets classification is more difficult that standard text classification due to their short length colloquial nature. Furthermore, the training data is imbalanced because legal tweets are very easy to find while illegal tweets of specific types are quite hard to come by. We propose a tree-like hierarchical model where each node is a full deep neural network based on convolutional LSTM architecture. In order to deal with highly imbalanced training data, tweets were classified in two stages: legal/illegal first before being classified among the illegal classes. Furthermore, ensemble classifiers were used to detect difficult illegal classes that were misclassified as legal by the first stage. Experiment result shows that this approach has significantly better performance than the baseline of using only a single network to classify among all classes in a single stage.Mahidol UniversityComputer ScienceMedicineClassification of Tweets Related to Illegal Activities in Thai LanguageConference PaperSCOPUS10.1109/iSAI-NLP.2018.8692858