Efficient Drug Terminology Mapping with Bidirectional Late-Interaction Reranking and Deterministic Reordering
| dc.contributor.author | Adulyanukosol N. | |
| dc.contributor.author | Chaisutyakorn K. | |
| dc.contributor.author | Sombutjaroan S. | |
| dc.contributor.author | Kanjanapong S. | |
| dc.contributor.author | Suriyaphol P. | |
| dc.contributor.correspondence | Adulyanukosol N. | |
| dc.contributor.other | Mahidol University | |
| dc.date.accessioned | 2026-05-16T18:12:00Z | |
| dc.date.available | 2026-05-16T18:12:00Z | |
| dc.date.issued | 2026-04-01 | |
| dc.description.abstract | Objectives: Standardizing medication concepts across heterogeneous vocabularies is essential for interoperable analytics and observational research. In the Observational Medical Outcomes Partnership (OMOP) Common Data Model, local drug codes must be mapped to standardized RxNorm concepts, but automated mapping is challenging because drug strings encode clinically critical attributes, including strength, dosage form/route, release characteristics, and brand. Methods: We propose THIRAWAT (Terminology Harmonization using Late-Interaction Reranker With Alignment-tuned Transformers), a fine-tuned ColBERTv1 late-interaction reranker, and embed it within THIRAWAT Mapper, a retrieval–reranking pipeline with deterministic tie-breaking and stable ordering. Candidate generation used approximate nearest-neighbor retrieval with a bi-encoder (SapBERT-XLMR or BioLORD-2023). Candidates were reranked by THIRAWAT models that were fine-tuned using one-sided MaxSim and scored at inference using our adapted Bidirectional MaxSim (BiMaxSim) pooling. Finally, a deterministic tie-breaker extracted clinically salient cues, including strength, dosage form/route, release characteristics, and bracketed brand annotations, to resolve near-ties reproducibly. Results: We evaluated three mapping settings: Branded Drugs, Clinical Drugs, and Thai Medicines Terminology (TMT). Using SapBERT-XLMR retrieval with THIRAWAT-Sap-BERT reranking and deterministic tie-breaking, THIRAWAT Mapper achieved MRR@100 values of 0.954 (95% confidence interval [CI], 0.921–0.983), 0.898 (95% CI, 0.866–0.925), and 0.912 (95% CI, 0.891–0.931), outperforming a lexical term frequency–inverse document frequency baseline (0.491, 0.216, and 0.143, respectively). Hits@1 improved to 0.942 (95% CI, 0.899–0.978), 0.859 (95% CI, 0.817–0.898), and 0.868 (95% CI, 0.838–0.896), respectively. Conclusions: BiMaxSim and deterministic tie-breaking improved drug mapping to RxNorm while preserving an efficient runtime profile and stable ordering. Overall, THIRAWAT Mapper offers a pragmatic combination of learned semantic matching and deterministic lexical constraints. Models and code are available on Hugging Face (https://huggingface.co/collections/sidataplus/thirawat) and GitHub (https://github.com/sidataplus/THIRAWAT-mapper). | |
| dc.identifier.citation | Healthcare Informatics Research Vol.32 No.2 (2026) , 156-165 | |
| dc.identifier.doi | 10.4258/hir.2026.32.2.156 | |
| dc.identifier.eissn | 2093369X | |
| dc.identifier.issn | 20933681 | |
| dc.identifier.scopus | 2-s2.0-105038369407 | |
| dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/123456789/116735 | |
| dc.rights.holder | SCOPUS | |
| dc.subject | Medicine | |
| dc.subject | Engineering | |
| dc.subject | Health Professions | |
| dc.title | Efficient Drug Terminology Mapping with Bidirectional Late-Interaction Reranking and Deterministic Reordering | |
| dc.type | Article | |
| mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105038369407&origin=inward | |
| oaire.citation.endPage | 165 | |
| oaire.citation.issue | 2 | |
| oaire.citation.startPage | 156 | |
| oaire.citation.title | Healthcare Informatics Research | |
| oaire.citation.volume | 32 | |
| oairecerif.author.affiliation | Siriraj Hospital |
