Comparative Analysis of Data Imputation Methods on F1 Performance Across Multiple Classification Algorithms

dc.contributor.authorTangworakitthaworn P.
dc.contributor.authorFujita K.
dc.contributor.authorWiphaalongkot N.
dc.contributor.correspondenceTangworakitthaworn P.
dc.contributor.otherMahidol University
dc.date.accessioned2026-03-20T18:29:28Z
dc.date.available2026-03-20T18:29:28Z
dc.date.issued2025-01-01
dc.description.abstractThe significant issue of developing the machine learning is quality and completeness of the datasets. Therefore, the suitable datasets should not have the missing values because these can lead to reducing the predictive accuracy and introducing the bias. This research project aims to evaluate the comparative analysis of the data imputation methods on F1 performance across multiple classification algorithms, which are Logistic Regression, Random Forest, and Linear Support Vector Machine (SVM). Moreover, the imputation applied on this project are divided into 5 modes which are Mode1: imputed by AI without data description, and this mode will impute the missing data by random imputation, Mode2: imputed by AI with data description, and this mode will impute the missing data by Model-based (iterative) imputation, Mode3: imputed by mean algorithm, Mode4: imputed by KNN algorithm, and Mode5: imputed by median algorithm. The datasets used for the comparative analysis cover the different size of missing data, ranging from 50,000 to 200,000 missing entries. As a result, the research findings revealed that the data imputation method using Mode2 (AI with Data Description) was the most effective for high percentages of missing data, while the data imputation method using Mode1 (AI without Data Description) was the least effective.
dc.identifier.citationIcsec 2025 29th International Computer Science and Engineering Conference 2025 (2025) , 90-93
dc.identifier.doi10.1109/ICSEC67360.2025.11298078
dc.identifier.scopus2-s2.0-105032729101
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/115809
dc.rights.holderSCOPUS
dc.subjectComputer Science
dc.subjectDecision Sciences
dc.titleComparative Analysis of Data Imputation Methods on F1 Performance Across Multiple Classification Algorithms
dc.typeConference Paper
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105032729101&origin=inward
oaire.citation.endPage93
oaire.citation.startPage90
oaire.citation.titleIcsec 2025 29th International Computer Science and Engineering Conference 2025
oairecerif.author.affiliationMahidol University
oairecerif.author.affiliationTokyo University of Agriculture and Technology

Files

Collections