Publication: Combining over-sampling and under-sampling techniques for imbalance dataset
Issued Date
2017-02-24
Resource Type
Other identifier(s)
2-s2.0-85024401679
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
ACM International Conference Proceeding Series. Vol.Part F128357, (2017), 243-247
Suggested Citation
Nutthaporn Junsomboon, Tanasanee Phienthrakul Combining over-sampling and under-sampling techniques for imbalance dataset. ACM International Conference Proceeding Series. Vol.Part F128357, (2017), 243-247. doi:10.1145/3055635.3056643 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/42358
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Combining over-sampling and under-sampling techniques for imbalance dataset
Author(s)
Other Contributor(s)
Abstract
© 2017 ACM. An important problem in medical data analysis is imbalance dataset. This problem is a cause of diagnostic mistake. The results of diagnostic affect to life of patients. If a doctor fails in diagnostic of patient who have disease that means he cannot treat patient in timely. However, the problem can be easily solved by adding or removing the data to closely balance for performance of diagnostic in medically. This paper proposed a solution to adjust imbalance dataset by combining Neighbor Cleaning Rule (NCL) and Synthetic Minority Over-Sampling Technique (SMOTE) techniques. The process of work is using NCL technique for removing sample data that are outliers in majority class and SMOTE technique is used for increasing sample data in minority class to closely balance dataset. After that, the balanced medical dataset is classified by Naïve Bayes, SMO and KNN algorithm. The experimental results show that the recall rate can be improved from the models that were created from balanced dataset.