Publication:
Efficient missing data technique for prediction of nasopharyngeal carcinoma recurrence

dc.contributor.authorPanrasee Ritthipravaten_US
dc.contributor.authorOrrawan Kumdeeen_US
dc.contributor.authorThongchai Bhongmakapaten_US
dc.contributor.otherMahidol Universityen_US
dc.contributor.otherFaculty of Medicine, Ramathibodi Hospital, Mahidol Universityen_US
dc.date.accessioned2018-10-19T04:52:05Z
dc.date.available2018-10-19T04:52:05Z
dc.date.issued2013-04-24en_US
dc.description.abstractThis study aims to investigate efficient missing data techniques for prediction of nasopharyngeal carcinoma (NPC) recurrence. Initially, clinical data of patients with NPC who received treatment at Ramathibodi hospital, Thailand, were collected. In total, 495 records were employed for the cancer recurrence prediction. Due to the fact that these data contain different missing values, appropriate missing data techniques (MDTs) must be examined. In this study, complete-case analysis, mean imputation, k-nearest neighbor imputation and Expectation Maximization (EM) imputation are mainly focused. The completed data are then used for developing three different predictive models, i.e., single-point model, multiple-point model and sequential neural network. The experimental results showed that EM imputation was superior to the other missing data techniques in which it provided highest predictive performance in all models. The average area under the receiver operating characteristic curve (AUC) of 0.72 could be achieved. The Hosmer and Lemeshow goodness of fittest was used for evaluating goodness of fit of each model. The results confirmed that EM imputation was the best missing data technique. The sequential neural network outperformed the other models. It provided the highest predictive performances in terms of the average AUC (0.73) and the Chi-square statistic (4.30). In addition, survival curves generated from these predictive models were compared with that of the Kaplan-Meier survival curve. The curves based on EM imputation were closest to the Kaplan-Meier model. From the log-rank test, however, these curves were significantly different (p-value < 0.05). © 2013 Asian Network for Scientific Information.en_US
dc.identifier.citationInformation Technology Journal. Vol.12, No.6 (2013), 1125-1133en_US
dc.identifier.doi10.3923/itj.2013.1125.1133en_US
dc.identifier.issn18125646en_US
dc.identifier.issn18125638en_US
dc.identifier.other2-s2.0-84876357650en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/31646
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84876357650&origin=inwarden_US
dc.subjectComputer Scienceen_US
dc.titleEfficient missing data technique for prediction of nasopharyngeal carcinoma recurrenceen_US
dc.typeArticleen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84876357650&origin=inwarden_US

Files

Collections