A Comparative Study of TF-IDF and Count Vectorizer under Random State Changes in a Random Forest Classifier for Emotion Detection

Kooptiwoot S.

A Comparative Study of TF-IDF and Count Vectorizer under Random State Changes in a Random Forest Classifier for Emotion Detection

6

Issued Date

2026-01-01

Resource Type

Article

ISSN

22414487

eISSN

17928036

DOI

10.48084/etasr.16158

Scopus ID

2-s2.0-105037642873

Journal Title

Engineering Technology and Applied Science Research

Volume

16

Issue

2

Start Page

33247

End Page

33252

Rights Holder(s)

SCOPUS

Bibliographic Citation

Engineering Technology and Applied Science Research Vol.16 No.2 (2026) , 33247-33252

Suggested Citation

Kooptiwoot S., Kooptiwoot S. A Comparative Study of TF-IDF and Count Vectorizer under Random State Changes in a Random Forest Classifier for Emotion Detection. Engineering Technology and Applied Science Research Vol.16 No.2 (2026) , 33247-33252. 33252. doi:10.48084/etasr.16158 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/116645

Title

A Comparative Study of TF-IDF and Count Vectorizer under Random State Changes in a Random Forest Classifier for Emotion Detection

Author(s)

Kooptiwoot S.
Kooptiwoot S.

Author's Affiliation

Siriraj Hospital
Suan Sunandha Rajabhat University

Corresponding Author(s)

Kooptiwoot S.

Other Contributor(s)

Mahidol University

Abstract

In machine learning processes, parameter settings affect model accuracy. Text-based emotion detection requires stable and accurate models, making parameter choices, such as the random state, increasingly important. Previous studies usually set the random state to 42, claiming that this should be the best for obtaining good accuracy. This study examined random state settings, experimenting with values from 1 to 720 and observing the results in accuracy. In addition, a dataset was employed for emotion detection using the Random Forest (RF) classifier with two vectorizers, TF-IDF and Count. The results show that different random state settings affect model accuracy. In the training subset, the TF-IDF vectorizer offered higher and more stable accuracy than the Count vectorizer. However, the Count vectorized achieved higher accuracy on both the validation and test sets.

Keyword(s)

Materials Science
Computer Science
Engineering

URI

https://repository.li.mahidol.ac.th/handle/123456789/116645

Collections

Scopus 2026

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th