Diagnosis (ICD-10) prediction from discharge summary by deep learning

Wanchana Ponthongmak

Diagnosis (ICD-10) prediction from discharge summary by deep learning

dc.contributor.advisor	Ammarin Thakkinstian
dc.contributor.advisor	Ratchainant Thammasudjarit
dc.contributor.advisor	Anuchate Pattanateepapon
dc.contributor.advisor	Oraluck Pattanaprateep
dc.contributor.author	Wanchana Ponthongmak
dc.date.accessioned	2026-02-26T06:32:22Z
dc.date.available	2026-02-26T06:32:22Z
dc.date.copyright	2022
dc.date.created	2085
dc.date.issued	2022
dc.description.abstract	Standardizing diagnosis data by the International Classification of Diseases and Related Health Problems 10th Revision (ICD-10) provides various benefits for healthcare, including inpatient care, healthcare management, and reimbursement. However, the ICD-10 assignment is a challenging task, which requires understanding the medical domains and also the ICD coding structure, leading to increased workload, time and resource consumption, and coding errors. Therefore, automated ICD-10 coding tools generated by the deep neural natural language processing (NLP) may be a helpful tool in minimizing burdens from manually coding. This study developed automated ICD-10 code assignment models using neural NLP methods utilizing 15,329 discharge summary data of Ramathibodi Hospital during 1st January 2015 to 31st December 2020. Three models were developed: 1) Naïve Bayes with term frequency-inverse document frequency (TF-IDF), 2) deep learning (DL) with neural word embedding, and 3) DL with PubMedBERT models. The results showed that the DL with PubMedBERT model provided the best performance, with the average micro and macro area under precision-recall curve (AUPRC) of 0.6605 and 0.5538, respectively. Followed by the DL with neural word embedding model (AUPRC = 0.6528 and 0.5564) and the Naïve Bayes with TF-IDF model (AUPRC = 0.4441 and 0.3562). The best model derived from Ramathibodi data was also externally validated in Medical Information Mart for Intensive Care III (MIMIC-III) data by three approaches; i.e., 1) directly predicting ICD-10 codes, 2) fine-tuning by default hyperparameters, and 3) fine-tuning by new hyperparameters with the corresponding average micro AUPRC of 0.3745, 0.6704 and 0.6801, and the average macro AUPRC of 0.2819, 0.5377, and 0.5493, respectively. In addition, the result after fine-tuning by the new hyperparameters indicated that the model performed as good as or a little better than the derived model. This study found that neural NLP models outperformed traditional machine learning for NLP with less effort on feature extraction. Additionally, applying a clinical contextual word embedding (i.e., PubMedBERT) leads to earning better performance than regular word embeddings. Hence, the model may be useful when applied as an automated tool for ICD-10 coding. However, external validation in smaller hospitals than Ramathibodi should be further performed. Deployment of this model in Ramathibodi Hospital should be planned and constructed to assess if the model can correctly classify ICD-10 and thus reduce workload for coders.
dc.format.mimetype	application/pdf
dc.identifier.uri	https://repository.li.mahidol.ac.th/handle/123456789/115374
dc.language.iso	eng
dc.publisher	Mahidol University
dc.rights	ผลงานนี้เป็นลิขสิทธิ์ของมหาวิทยาลัยมหิดล ขอสงวนไว้สำหรับเพื่อการศึกษาเท่านั้น ต้องอ้างอิงแหล่งที่มา ห้ามดัดแปลงเนื้อหา และห้ามนำไปใช้เพื่อการค้า
dc.rights.holder	Mahidol University
dc.subject	Nosology -- Data processing.
dc.subject	Natural language processing (Computer science) -- Medical applications.
dc.subject	Deep learning (Machine learning) -- Therapeutic use.
dc.subject	Ph.D. (2022)
dc.subject	Data Science for Health Care (Mahidol University 2022)
dc.title	Diagnosis (ICD-10) prediction from discharge summary by deep learning
dc.type	Doctoral Thesis
dcterms.accessRights	open access
thesis.degree.department	Faculty of Medicine Ramathibodi Hospital
thesis.degree.discipline	Data Science for Health Care
thesis.degree.grantor	Mahidol University
thesis.degree.level	Doctoral degree
thesis.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: TH_Wanchana_P_2022.pdf
Size:: 7.1 MB
Format:: Adobe Portable Document Format

Download

Collections

Dissertations

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th