A Novel Term Weighting Scheme for Imbalanced Text Classification
Issued Date
2022-06-01
Resource Type
ISSN
03505596
eISSN
18543871
Scopus ID
2-s2.0-85135636469
Journal Title
Informatica (Slovenia)
Volume
46
Issue
2
Start Page
259
End Page
268
Rights Holder(s)
SCOPUS
Bibliographic Citation
Informatica (Slovenia) Vol.46 No.2 (2022) , 259-268
Suggested Citation
Tantisripreecha T., Soonthornphisaj N. A Novel Term Weighting Scheme for Imbalanced Text Classification. Informatica (Slovenia) Vol.46 No.2 (2022) , 259-268. 268. doi:10.31449/inf.v46i2.3523 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/84271
Title
A Novel Term Weighting Scheme for Imbalanced Text Classification
Author(s)
Author's Affiliation
Other Contributor(s)
Abstract
High dimensional feature is the main problem of text domain. If imbalance class is also found in the context, the classifier’s performance is worsen. Moreover, solving imbalance problem by oversampling method in this circumstance is very difficult to get performance improvement. In this paper, a new term weighting scheme is proposed by combining Term frequency with an average of inverse document frequency factor. We denoted our scheme by TFmeanIDF. Our proposed method has high potential for imbalance text domain with high dimension. No feature selection or oversampling method is required. Extensive comparison results on 7 datasets validate the advantages of TFmeanIDF in terms of F1 score obtained from widely used base classifiers, such as Logistic regression and Support Vector Machines. We found that F1 score of minority class is higher than that of baseline term weighting schemes. Using TFmeanIDF as a term weighting shows promising result for logistics regression and support vector machines.