Integrating Agentic Artificial Intelligence to Automate International Classification of Diseases, Tenth Revision, Medical Coding

Akkhawatthanakun K.; Narupiyakul L.; Wongpatikaseree K.; Hnoohom N.; Termritthikun C.; Muneesawang P.

Integrating Agentic Artificial Intelligence to Automate International Classification of Diseases, Tenth Revision, Medical Coding

Issued Date

2026-03-01

Resource Type

Article

eISSN

22279709

DOI

10.3390/informatics13030039

Scopus ID

2-s2.0-105033869908

Journal Title

Informatics

Volume

13

Issue

3

Rights Holder(s)

SCOPUS

Bibliographic Citation

Informatics Vol.13 No.3 (2026)

Suggested Citation

Akkhawatthanakun K., Narupiyakul L., Wongpatikaseree K., Hnoohom N., Termritthikun C., Muneesawang P. Integrating Agentic Artificial Intelligence to Automate International Classification of Diseases, Tenth Revision, Medical Coding. Informatics Vol.13 No.3 (2026). doi:10.3390/informatics13030039 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/116029

Title

Integrating Agentic Artificial Intelligence to Automate International Classification of Diseases, Tenth Revision, Medical Coding

Author(s)

Akkhawatthanakun K.
Narupiyakul L.
Wongpatikaseree K.
Hnoohom N.
Termritthikun C.
Muneesawang P.

Author's Affiliation

Mahidol University
Naresuan University

Corresponding Author(s)

Akkhawatthanakun K.

Other Contributor(s)

Mahidol University

Abstract

Automating ICD-10 coding from discharge summaries remains demanding because coders analyze clinical narratives while justifying decisions. This study compares three automation patterns: PLM-ICD as a standalone deep learning system emitting 15 codes per case, LLM-only generation with full autonomy, and a hybrid approach where PLM-ICD drafts candidates for an agentic LLM audit to accept or reject. All strategies were evaluated on 19,801 MIMIC-IV summaries using four LLMs spanning compact (Qwen2.5-3B-Instruct, Llama-3.2-3B-Instruct, Phi-4-mini-instruct) to large-scale (Sonnet-4.5). Precision guided evaluation because coders still supply any missing diagnoses. PLM-ICD alone reached 55.8% precision while always surfacing 15 suggestions. LLM-only generation lagged severely (1.5–34.6% precision) and produced inconsistent output sizes. The agentic audit delivered the best trade-off: compact LLMs reviewed the 15 candidates, discarded weak evidence, and returned 2–8 high-confidence codes. Llama-3.2-3B-Instruct, for example, improved from 1.5% as a generator to 55.1% as a verifier while trimming false positives by 73%. These results show that positioning LLMs as quality controllers, rather than primary generators, yields reliable support for clinical coding teams, while formal recall/F1 reporting remains future work for fully autonomous implementations.

Keyword(s)

Computer Science
Social Sciences

URI

https://repository.li.mahidol.ac.th/handle/123456789/116029

Collections

Scopus 2026

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th