Publication: Similarity measurement for sentiment classification on textual reviews
Issued Date
2018-03-24
Resource Type
Other identifier(s)
2-s2.0-85057605424
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
ACM International Conference Proceeding Series. (2018), 24-28
Suggested Citation
Tan Thongtan, Tanasanee Phienthrakul Similarity measurement for sentiment classification on textual reviews. ACM International Conference Proceeding Series. (2018), 24-28. doi:10.1145/3206185.3206204 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/45644
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Similarity measurement for sentiment classification on textual reviews
Author(s)
Other Contributor(s)
Abstract
© 2018 Association for Computing Machinery. Sentiment classification on textual reviews refers to classifying textual reviews based on whether they are positive or negative. This research focuses on classifying movie reviews, and is benchmarked on the IMDB dataset, which consists of long movie reviews, using accuracy as the evaluation metric. In sentiment classification, each document must be mapped to a fixed length vector. Document embedding models map each document to a dense, low-dimensional vector in continuous vector space. This research proposes to train document embedding using cosine similarity instead of dot product. Experiments on the IMDB dataset show that accuracy is improved when using cosine similarity compared to using dot product, while using feature combination with Naïve-Bayes weighted bag of n-grams achieves a new state of the art accuracy of 97.4%.