Diagnosis (ICD-10) prediction from discharge summary by deep learning

dc.contributor.advisorAmmarin Thakkinstian
dc.contributor.advisorRatchainant Thammasudjarit
dc.contributor.advisorAnuchate Pattanateepapon
dc.contributor.advisorOraluck Pattanaprateep
dc.contributor.authorWanchana Ponthongmak
dc.date.accessioned2026-02-26T06:32:22Z
dc.date.available2026-02-26T06:32:22Z
dc.date.copyright2022
dc.date.created2085
dc.date.issued2022
dc.description.abstractStandardizing diagnosis data by the International Classification of Diseases and Related Health Problems 10th Revision (ICD-10) provides various benefits for healthcare, including inpatient care, healthcare management, and reimbursement. However, the ICD-10 assignment is a challenging task, which requires understanding the medical domains and also the ICD coding structure, leading to increased workload, time and resource consumption, and coding errors. Therefore, automated ICD-10 coding tools generated by the deep neural natural language processing (NLP) may be a helpful tool in minimizing burdens from manually coding. This study developed automated ICD-10 code assignment models using neural NLP methods utilizing 15,329 discharge summary data of Ramathibodi Hospital during 1st January 2015 to 31st December 2020. Three models were developed: 1) Naïve Bayes with term frequency-inverse document frequency (TF-IDF), 2) deep learning (DL) with neural word embedding, and 3) DL with PubMedBERT models. The results showed that the DL with PubMedBERT model provided the best performance, with the average micro and macro area under precision-recall curve (AUPRC) of 0.6605 and 0.5538, respectively. Followed by the DL with neural word embedding model (AUPRC = 0.6528 and 0.5564) and the Naïve Bayes with TF-IDF model (AUPRC = 0.4441 and 0.3562). The best model derived from Ramathibodi data was also externally validated in Medical Information Mart for Intensive Care III (MIMIC-III) data by three approaches; i.e., 1) directly predicting ICD-10 codes, 2) fine-tuning by default hyperparameters, and 3) fine-tuning by new hyperparameters with the corresponding average micro AUPRC of 0.3745, 0.6704 and 0.6801, and the average macro AUPRC of 0.2819, 0.5377, and 0.5493, respectively. In addition, the result after fine-tuning by the new hyperparameters indicated that the model performed as good as or a little better than the derived model. This study found that neural NLP models outperformed traditional machine learning for NLP with less effort on feature extraction. Additionally, applying a clinical contextual word embedding (i.e., PubMedBERT) leads to earning better performance than regular word embeddings. Hence, the model may be useful when applied as an automated tool for ICD-10 coding. However, external validation in smaller hospitals than Ramathibodi should be further performed. Deployment of this model in Ramathibodi Hospital should be planned and constructed to assess if the model can correctly classify ICD-10 and thus reduce workload for coders.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/115374
dc.language.isoeng
dc.publisherMahidol University
dc.rightsผลงานนี้เป็นลิขสิทธิ์ของมหาวิทยาลัยมหิดล ขอสงวนไว้สำหรับเพื่อการศึกษาเท่านั้น ต้องอ้างอิงแหล่งที่มา ห้ามดัดแปลงเนื้อหา และห้ามนำไปใช้เพื่อการค้า
dc.rights.holderMahidol University
dc.subjectNosology -- Data processing.
dc.subjectNatural language processing (Computer science) -- Medical applications.
dc.subjectDeep learning (Machine learning) -- Therapeutic use.
dc.subjectPh.D. (2022)
dc.subjectData Science for Health Care (Mahidol University 2022)
dc.titleDiagnosis (ICD-10) prediction from discharge summary by deep learning
dc.typeDoctoral Thesis
dcterms.accessRightsopen access
thesis.degree.departmentFaculty of Medicine Ramathibodi Hospital
thesis.degree.disciplineData Science for Health Care
thesis.degree.grantorMahidol University
thesis.degree.levelDoctoral degree
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
TH_Wanchana_P_2022.pdf
Size:
7.1 MB
Format:
Adobe Portable Document Format

Collections