Efficient Drug Terminology Mapping with Bidirectional Late-Interaction Reranking and Deterministic Reordering

Adulyanukosol N.; Chaisutyakorn K.; Sombutjaroan S.; Kanjanapong S.; Suriyaphol P.

Efficient Drug Terminology Mapping with Bidirectional Late-Interaction Reranking and Deterministic Reordering

1

Issued Date

2026-04-01

Resource Type

Article

ISSN

20933681

eISSN

2093369X

DOI

10.4258/hir.2026.32.2.156

Scopus ID

2-s2.0-105038369407

Journal Title

Healthcare Informatics Research

Volume

32

Issue

2

Start Page

156

End Page

165

Rights Holder(s)

SCOPUS

Bibliographic Citation

Healthcare Informatics Research Vol.32 No.2 (2026) , 156-165

Suggested Citation

Adulyanukosol N., Chaisutyakorn K., Sombutjaroan S., Kanjanapong S., Suriyaphol P. Efficient Drug Terminology Mapping with Bidirectional Late-Interaction Reranking and Deterministic Reordering. Healthcare Informatics Research Vol.32 No.2 (2026) , 156-165. 165. doi:10.4258/hir.2026.32.2.156 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/116735

Title

Efficient Drug Terminology Mapping with Bidirectional Late-Interaction Reranking and Deterministic Reordering

Author(s)

Adulyanukosol N.
Chaisutyakorn K.
Sombutjaroan S.
Kanjanapong S.
Suriyaphol P.

Author's Affiliation

Siriraj Hospital

Corresponding Author(s)

Adulyanukosol N.

Other Contributor(s)

Mahidol University

Abstract

Objectives: Standardizing medication concepts across heterogeneous vocabularies is essential for interoperable analytics and observational research. In the Observational Medical Outcomes Partnership (OMOP) Common Data Model, local drug codes must be mapped to standardized RxNorm concepts, but automated mapping is challenging because drug strings encode clinically critical attributes, including strength, dosage form/route, release characteristics, and brand. Methods: We propose THIRAWAT (Terminology Harmonization using Late-Interaction Reranker With Alignment-tuned Transformers), a fine-tuned ColBERTv1 late-interaction reranker, and embed it within THIRAWAT Mapper, a retrieval–reranking pipeline with deterministic tie-breaking and stable ordering. Candidate generation used approximate nearest-neighbor retrieval with a bi-encoder (SapBERT-XLMR or BioLORD-2023). Candidates were reranked by THIRAWAT models that were fine-tuned using one-sided MaxSim and scored at inference using our adapted Bidirectional MaxSim (BiMaxSim) pooling. Finally, a deterministic tie-breaker extracted clinically salient cues, including strength, dosage form/route, release characteristics, and bracketed brand annotations, to resolve near-ties reproducibly. Results: We evaluated three mapping settings: Branded Drugs, Clinical Drugs, and Thai Medicines Terminology (TMT). Using SapBERT-XLMR retrieval with THIRAWAT-Sap-BERT reranking and deterministic tie-breaking, THIRAWAT Mapper achieved MRR@100 values of 0.954 (95% confidence interval [CI], 0.921–0.983), 0.898 (95% CI, 0.866–0.925), and 0.912 (95% CI, 0.891–0.931), outperforming a lexical term frequency–inverse document frequency baseline (0.491, 0.216, and 0.143, respectively). Hits@1 improved to 0.942 (95% CI, 0.899–0.978), 0.859 (95% CI, 0.817–0.898), and 0.868 (95% CI, 0.838–0.896), respectively. Conclusions: BiMaxSim and deterministic tie-breaking improved drug mapping to RxNorm while preserving an efficient runtime profile and stable ordering. Overall, THIRAWAT Mapper offers a pragmatic combination of learned semantic matching and deterministic lexical constraints. Models and code are available on Hugging Face (https://huggingface.co/collections/sidataplus/thirawat) and GitHub (https://github.com/sidataplus/THIRAWAT-mapper).

Keyword(s)

Medicine
Engineering
Health Professions

URI

https://repository.li.mahidol.ac.th/handle/123456789/116735

Collections

Scopus 2026

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th