Performance of large language models on Thailand's national medical licensing examination: a cross-sectional study

dc.contributor.authorSaowaprut P.
dc.contributor.authorWabina R.S.
dc.contributor.authorYang J.
dc.contributor.authorSiriwat L.
dc.contributor.correspondenceSaowaprut P.
dc.contributor.otherMahidol University
dc.date.accessioned2025-05-26T18:09:20Z
dc.date.available2025-05-26T18:09:20Z
dc.date.issued2025-01-01
dc.description.abstractPURPOSE: This study aimed to evaluate the feasibility of general-purpose large language models (LLMs) in addressing inequities in medical licensure exam preparation for Thailand's National Medical Licensing Examination (ThaiNLE), which currently lacks standardized public study materials. METHODS: We assessed 4 multi-modal LLMs (GPT-4, Claude 3 Opus, Gemini 1.0/1.5 Pro) using a 304-question ThaiNLE Step 1 mock examination (10.2% image-based), applying deterministic API configurations and 5 inference repetitions per model. Performance was measured via micro- and macro-accuracy metrics compared against historical passing thresholds. RESULTS: All models exceeded passing scores, with GPT-4 achieving the highest accuracy (88.9%; 95% confidence interval, 88.7-89.1), surpassing Thailand's national average by more than 2 standard deviations. Claude 3.5 Sonnet (80.1%) and Gemini 1.5 Pro (72.8%) followed hierarchically. Models demonstrated robustness across 17 of 20 medical domains, but variability was noted in genetics (74.0%) and cardiovascular topics (58.3%). While models demonstrated proficiency with images (Gemini 1.0 Pro: +9.9% vs. text), text-only accuracy remained superior (GPT-4o: 90.0% vs. 82.6%). CONCLUSION: General-purpose LLMs show promise as equitable preparatory tools for ThaiNLE Step 1. However, domain-specific knowledge gaps and inconsistent multi-modal integration warrant refinement before clinical deployment.
dc.identifier.citationJournal of educational evaluation for health professions Vol.22 (2025) , 16
dc.identifier.doi10.3352/jeehp.2025.22.16
dc.identifier.eissn19755937
dc.identifier.pmid40354784
dc.identifier.scopus2-s2.0-105005377591
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/110365
dc.rights.holderSCOPUS
dc.subjectMedicine
dc.titlePerformance of large language models on Thailand's national medical licensing examination: a cross-sectional study
dc.typeArticle
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105005377591&origin=inward
oaire.citation.titleJournal of educational evaluation for health professions
oaire.citation.volume22
oairecerif.author.affiliationRamathibodi Hospital

Files

Collections