On Creating an English-Thai Code-switched Machine Translation in Medical Domain

dc.contributor.authorPengpun P.
dc.contributor.authorTiankanon K.
dc.contributor.authorChinkamol A.
dc.contributor.authorKinchagawat J.
dc.contributor.authorChairuengjitjaras P.
dc.contributor.authorSupholkhan P.
dc.contributor.authorAussavavirojekul P.
dc.contributor.authorBoonnag C.
dc.contributor.authorVeerakanjana K.
dc.contributor.authorPhimsiri H.
dc.contributor.authorSae-Jia B.
dc.contributor.authorSataudom N.
dc.contributor.authorIttichaiwong P.
dc.contributor.authorLimkonchotiwat P.
dc.contributor.correspondencePengpun P.
dc.contributor.otherMahidol University
dc.date.accessioned2025-02-18T18:11:18Z
dc.date.available2025-02-18T18:11:18Z
dc.date.issued2024-01-01
dc.description.abstractMachine translation (MT) in the medical domain plays a pivotal role in enhancing healthcare quality and disseminating medical knowledge. Despite advancements in English-Thai MT technology, common MT approaches often underperform in the medical field due to their inability to precisely translate medical terminologies. Our research prioritizes not merely improving translation accuracy but also maintaining medical terminology in English within the translated text through code-switched (CS) translation. We developed a method to produce CS medical translation data, fine-tuned a CS translation model with this data, and evaluated its performance against strong baselines, such as Google Neural Machine Translation (NMT) and GPT-3.5/GPT-4. Our model demonstrated competitive performance in automatic metrics and was highly favored in human preference evaluations. Our evaluation result also shows that medical professionals significantly prefer CS translations that maintain critical English terms accurately, even if it slightly compromises fluency. Our code and test set are publicly available https://github.com/preceptorai-org/NLLB_CS_EM_NLP2024.
dc.identifier.citationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024 (2024) , 6055-6073
dc.identifier.scopus2-s2.0-85217615242
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/105340
dc.rights.holderSCOPUS
dc.subjectComputer Science
dc.subjectSocial Sciences
dc.titleOn Creating an English-Thai Code-switched Machine Translation in Medical Domain
dc.typeConference Paper
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85217615242&origin=inward
oaire.citation.endPage6073
oaire.citation.startPage6055
oaire.citation.titleEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
oairecerif.author.affiliationVidyasirimedhi Institute of Science and Technology
oairecerif.author.affiliationChulalongkorn University
oairecerif.author.affiliationMahidol University
oairecerif.author.affiliationKing's College London
oairecerif.author.affiliationCARIVA Thailand
oairecerif.author.affiliationBangkok Christian College

Files

Collections