Diagnosis (ICD-10) prediction from discharge summary by deep learning
| dc.contributor.advisor | Ammarin Thakkinstian | |
| dc.contributor.advisor | Ratchainant Thammasudjarit | |
| dc.contributor.advisor | Anuchate Pattanateepapon | |
| dc.contributor.advisor | Oraluck Pattanaprateep | |
| dc.contributor.author | Wanchana Ponthongmak | |
| dc.date.accessioned | 2026-02-26T06:32:22Z | |
| dc.date.available | 2026-02-26T06:32:22Z | |
| dc.date.copyright | 2022 | |
| dc.date.created | 2085 | |
| dc.date.issued | 2022 | |
| dc.description.abstract | Standardizing diagnosis data by the International Classification of Diseases and Related Health Problems 10th Revision (ICD-10) provides various benefits for healthcare, including inpatient care, healthcare management, and reimbursement. However, the ICD-10 assignment is a challenging task, which requires understanding the medical domains and also the ICD coding structure, leading to increased workload, time and resource consumption, and coding errors. Therefore, automated ICD-10 coding tools generated by the deep neural natural language processing (NLP) may be a helpful tool in minimizing burdens from manually coding. This study developed automated ICD-10 code assignment models using neural NLP methods utilizing 15,329 discharge summary data of Ramathibodi Hospital during 1st January 2015 to 31st December 2020. Three models were developed: 1) Naïve Bayes with term frequency-inverse document frequency (TF-IDF), 2) deep learning (DL) with neural word embedding, and 3) DL with PubMedBERT models. The results showed that the DL with PubMedBERT model provided the best performance, with the average micro and macro area under precision-recall curve (AUPRC) of 0.6605 and 0.5538, respectively. Followed by the DL with neural word embedding model (AUPRC = 0.6528 and 0.5564) and the Naïve Bayes with TF-IDF model (AUPRC = 0.4441 and 0.3562). The best model derived from Ramathibodi data was also externally validated in Medical Information Mart for Intensive Care III (MIMIC-III) data by three approaches; i.e., 1) directly predicting ICD-10 codes, 2) fine-tuning by default hyperparameters, and 3) fine-tuning by new hyperparameters with the corresponding average micro AUPRC of 0.3745, 0.6704 and 0.6801, and the average macro AUPRC of 0.2819, 0.5377, and 0.5493, respectively. In addition, the result after fine-tuning by the new hyperparameters indicated that the model performed as good as or a little better than the derived model. This study found that neural NLP models outperformed traditional machine learning for NLP with less effort on feature extraction. Additionally, applying a clinical contextual word embedding (i.e., PubMedBERT) leads to earning better performance than regular word embeddings. Hence, the model may be useful when applied as an automated tool for ICD-10 coding. However, external validation in smaller hospitals than Ramathibodi should be further performed. Deployment of this model in Ramathibodi Hospital should be planned and constructed to assess if the model can correctly classify ICD-10 and thus reduce workload for coders. | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/123456789/115374 | |
| dc.language.iso | eng | |
| dc.publisher | Mahidol University | |
| dc.rights | ผลงานนี้เป็นลิขสิทธิ์ของมหาวิทยาลัยมหิดล ขอสงวนไว้สำหรับเพื่อการศึกษาเท่านั้น ต้องอ้างอิงแหล่งที่มา ห้ามดัดแปลงเนื้อหา และห้ามนำไปใช้เพื่อการค้า | |
| dc.rights.holder | Mahidol University | |
| dc.subject | Nosology -- Data processing. | |
| dc.subject | Natural language processing (Computer science) -- Medical applications. | |
| dc.subject | Deep learning (Machine learning) -- Therapeutic use. | |
| dc.subject | Ph.D. (2022) | |
| dc.subject | Data Science for Health Care (Mahidol University 2022) | |
| dc.title | Diagnosis (ICD-10) prediction from discharge summary by deep learning | |
| dc.type | Doctoral Thesis | |
| dcterms.accessRights | open access | |
| thesis.degree.department | Faculty of Medicine Ramathibodi Hospital | |
| thesis.degree.discipline | Data Science for Health Care | |
| thesis.degree.grantor | Mahidol University | |
| thesis.degree.level | Doctoral degree | |
| thesis.degree.name | Doctor of Philosophy |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- TH_Wanchana_P_2022.pdf
- Size:
- 7.1 MB
- Format:
- Adobe Portable Document Format
