Efficient Drug Terminology Mapping with Bidirectional Late-Interaction Reranking and Deterministic Reordering
1
Issued Date
2026-04-01
Resource Type
ISSN
20933681
eISSN
2093369X
Scopus ID
2-s2.0-105038369407
Journal Title
Healthcare Informatics Research
Volume
32
Issue
2
Start Page
156
End Page
165
Rights Holder(s)
SCOPUS
Bibliographic Citation
Healthcare Informatics Research Vol.32 No.2 (2026) , 156-165
Suggested Citation
Adulyanukosol N., Chaisutyakorn K., Sombutjaroan S., Kanjanapong S., Suriyaphol P. Efficient Drug Terminology Mapping with Bidirectional Late-Interaction Reranking and Deterministic Reordering. Healthcare Informatics Research Vol.32 No.2 (2026) , 156-165. 165. doi:10.4258/hir.2026.32.2.156 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/116735
Title
Efficient Drug Terminology Mapping with Bidirectional Late-Interaction Reranking and Deterministic Reordering
Author's Affiliation
Corresponding Author(s)
Other Contributor(s)
Abstract
Objectives: Standardizing medication concepts across heterogeneous vocabularies is essential for interoperable analytics and observational research. In the Observational Medical Outcomes Partnership (OMOP) Common Data Model, local drug codes must be mapped to standardized RxNorm concepts, but automated mapping is challenging because drug strings encode clinically critical attributes, including strength, dosage form/route, release characteristics, and brand. Methods: We propose THIRAWAT (Terminology Harmonization using Late-Interaction Reranker With Alignment-tuned Transformers), a fine-tuned ColBERTv1 late-interaction reranker, and embed it within THIRAWAT Mapper, a retrieval–reranking pipeline with deterministic tie-breaking and stable ordering. Candidate generation used approximate nearest-neighbor retrieval with a bi-encoder (SapBERT-XLMR or BioLORD-2023). Candidates were reranked by THIRAWAT models that were fine-tuned using one-sided MaxSim and scored at inference using our adapted Bidirectional MaxSim (BiMaxSim) pooling. Finally, a deterministic tie-breaker extracted clinically salient cues, including strength, dosage form/route, release characteristics, and bracketed brand annotations, to resolve near-ties reproducibly. Results: We evaluated three mapping settings: Branded Drugs, Clinical Drugs, and Thai Medicines Terminology (TMT). Using SapBERT-XLMR retrieval with THIRAWAT-Sap-BERT reranking and deterministic tie-breaking, THIRAWAT Mapper achieved MRR@100 values of 0.954 (95% confidence interval [CI], 0.921–0.983), 0.898 (95% CI, 0.866–0.925), and 0.912 (95% CI, 0.891–0.931), outperforming a lexical term frequency–inverse document frequency baseline (0.491, 0.216, and 0.143, respectively). Hits@1 improved to 0.942 (95% CI, 0.899–0.978), 0.859 (95% CI, 0.817–0.898), and 0.868 (95% CI, 0.838–0.896), respectively. Conclusions: BiMaxSim and deterministic tie-breaking improved drug mapping to RxNorm while preserving an efficient runtime profile and stable ordering. Overall, THIRAWAT Mapper offers a pragmatic combination of learned semantic matching and deterministic lexical constraints. Models and code are available on Hugging Face (https://huggingface.co/collections/sidataplus/thirawat) and GitHub (https://github.com/sidataplus/THIRAWAT-mapper).
