Comparative Analysis of Data Imputation Methods on F1 Performance Across Multiple Classification Algorithms
| dc.contributor.author | Tangworakitthaworn P. | |
| dc.contributor.author | Fujita K. | |
| dc.contributor.author | Wiphaalongkot N. | |
| dc.contributor.correspondence | Tangworakitthaworn P. | |
| dc.contributor.other | Mahidol University | |
| dc.date.accessioned | 2026-03-20T18:29:28Z | |
| dc.date.available | 2026-03-20T18:29:28Z | |
| dc.date.issued | 2025-01-01 | |
| dc.description.abstract | The significant issue of developing the machine learning is quality and completeness of the datasets. Therefore, the suitable datasets should not have the missing values because these can lead to reducing the predictive accuracy and introducing the bias. This research project aims to evaluate the comparative analysis of the data imputation methods on F1 performance across multiple classification algorithms, which are Logistic Regression, Random Forest, and Linear Support Vector Machine (SVM). Moreover, the imputation applied on this project are divided into 5 modes which are Mode1: imputed by AI without data description, and this mode will impute the missing data by random imputation, Mode2: imputed by AI with data description, and this mode will impute the missing data by Model-based (iterative) imputation, Mode3: imputed by mean algorithm, Mode4: imputed by KNN algorithm, and Mode5: imputed by median algorithm. The datasets used for the comparative analysis cover the different size of missing data, ranging from 50,000 to 200,000 missing entries. As a result, the research findings revealed that the data imputation method using Mode2 (AI with Data Description) was the most effective for high percentages of missing data, while the data imputation method using Mode1 (AI without Data Description) was the least effective. | |
| dc.identifier.citation | Icsec 2025 29th International Computer Science and Engineering Conference 2025 (2025) , 90-93 | |
| dc.identifier.doi | 10.1109/ICSEC67360.2025.11298078 | |
| dc.identifier.scopus | 2-s2.0-105032729101 | |
| dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/123456789/115809 | |
| dc.rights.holder | SCOPUS | |
| dc.subject | Computer Science | |
| dc.subject | Decision Sciences | |
| dc.title | Comparative Analysis of Data Imputation Methods on F1 Performance Across Multiple Classification Algorithms | |
| dc.type | Conference Paper | |
| mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105032729101&origin=inward | |
| oaire.citation.endPage | 93 | |
| oaire.citation.startPage | 90 | |
| oaire.citation.title | Icsec 2025 29th International Computer Science and Engineering Conference 2025 | |
| oairecerif.author.affiliation | Mahidol University | |
| oairecerif.author.affiliation | Tokyo University of Agriculture and Technology |
