Comparative Analysis of Data Imputation Methods on F1 Performance Across Multiple Classification Algorithms

Tangworakitthaworn P.; Fujita K.; Wiphaalongkot N.

Comparative Analysis of Data Imputation Methods on F1 Performance Across Multiple Classification Algorithms

2

Issued Date

2025-01-01

Resource Type

Conference Paper

DOI

10.1109/ICSEC67360.2025.11298078

Scopus ID

2-s2.0-105032729101

Journal Title

Icsec 2025 29th International Computer Science and Engineering Conference 2025

Start Page

90

End Page

93

Rights Holder(s)

SCOPUS

Bibliographic Citation

Icsec 2025 29th International Computer Science and Engineering Conference 2025 (2025) , 90-93

Suggested Citation

Tangworakitthaworn P., Fujita K., Wiphaalongkot N. Comparative Analysis of Data Imputation Methods on F1 Performance Across Multiple Classification Algorithms. Icsec 2025 29th International Computer Science and Engineering Conference 2025 (2025) , 90-93. 93. doi:10.1109/ICSEC67360.2025.11298078 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/115809

Title

Comparative Analysis of Data Imputation Methods on F1 Performance Across Multiple Classification Algorithms

Author(s)

Tangworakitthaworn P.
Fujita K.
Wiphaalongkot N.

Author's Affiliation

Mahidol University
Tokyo University of Agriculture and Technology

Corresponding Author(s)

Tangworakitthaworn P.

Other Contributor(s)

Mahidol University

Abstract

The significant issue of developing the machine learning is quality and completeness of the datasets. Therefore, the suitable datasets should not have the missing values because these can lead to reducing the predictive accuracy and introducing the bias. This research project aims to evaluate the comparative analysis of the data imputation methods on F1 performance across multiple classification algorithms, which are Logistic Regression, Random Forest, and Linear Support Vector Machine (SVM). Moreover, the imputation applied on this project are divided into 5 modes which are Mode1: imputed by AI without data description, and this mode will impute the missing data by random imputation, Mode2: imputed by AI with data description, and this mode will impute the missing data by Model-based (iterative) imputation, Mode3: imputed by mean algorithm, Mode4: imputed by KNN algorithm, and Mode5: imputed by median algorithm. The datasets used for the comparative analysis cover the different size of missing data, ranging from 50,000 to 200,000 missing entries. As a result, the research findings revealed that the data imputation method using Mode2 (AI with Data Description) was the most effective for high percentages of missing data, while the data imputation method using Mode1 (AI without Data Description) was the least effective.

Keyword(s)

Computer Science
Decision Sciences

URI

https://repository.li.mahidol.ac.th/handle/123456789/115809

Collections

Scopus 2025

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th