Detection of flawed multiple-choice questions in preclinical medical education using item difficulty and discrimination indices: a six-year analysis
| dc.contributor.author | Srisomsak V. | |
| dc.contributor.author | Sitticharoon C. | |
| dc.contributor.author | Keadkraichaiwat I. | |
| dc.contributor.author | Meethes S. | |
| dc.contributor.author | Inpaen I. | |
| dc.contributor.correspondence | Srisomsak V. | |
| dc.contributor.other | Mahidol University | |
| dc.date.accessioned | 2026-02-06T18:29:53Z | |
| dc.date.available | 2026-02-06T18:29:53Z | |
| dc.date.issued | 2026-12-01 | |
| dc.description.abstract | Background: MCQ exams may include flawed items affecting validity. Psychometric indicators such as item difficulty (p-value) and point-biserial coefficient (r<inf>pb</inf>-value) are widely used to identify problematic questions. Evidence on using p-value (< 0.25) and/or r<inf>pb</inf>-value thresholds (< 0) to detect flawed items remains limited. This study aimed to provide a proof-of-concept using a large, real-world dataset, evaluating how often flawed items were missed when relying solely on static thresholds. Methods: Exam analyses from 32 preclinical courses (academic years 2017–2022) were reviewed. Items meeting predefined thresholds were flagged, while all items were manually reviewed when the most frequently chosen answer was not the keyed correct answer or when multiple options had similar p-values. Flagged items were sent to course directors for verification, and only confirmed items were recorded as corrections. Results: Among 236 exams, 59 (25.0%) required corrections, with at least 1 corrected item. Of 14,238 total items, 81 (0.6%) required correction, most due to ‘multiple-answers’ (46.9%), followed by ‘wrong-answer’ (40.7%), awarding points to all choices (‘all-choices’) (7.4%), and ‘item-removal’ (4.9%) causes. Of 77 corrected items with available p- and r<inf>pb</inf>-values (excluding ‘item-removal’), 66 (85.7%) met thresholds, while 11 (14.3%) had p-value ≥ 0.25 and r<inf>pb</inf>-value ≥ 0, indicating thresholds missed 14.3%. Corrected items had significantly lower p- and r<inf>pb</inf>-values than uncorrected items (P < 0.001). Item correction status showed negative correlations with both p- and r<inf>pb</inf>-values, suggesting flawed items tended to be more difficult and less discriminative. For the ‘wrong-answer’ and ‘multiple-answers’ causes, the actual correct answers had higher p-values than the initially designated ones (P < 0.001). For the ‘wrong-answer’ cause, the actual correct answers had higher r<inf>pb</inf>-values (P < 0.001), while they were comparable in the ‘multiple-answers’ cause, highlighting flaws even with unremarkable r<inf>pb</inf>-values. Conclusion: While p- and/or r<inf>pb</inf>-value thresholds detected 85.7% of flawed items, 14.3% were missed, underscoring that static thresholds are insufficient to ensure item quality. This study provides empirical evidence supporting the integration of quantitative indices with expert review in exam evaluation. Incorporating items where the keyed answer has a p-value less than or similar to distractors may improve detection but increase workload, reflecting a trade-off between accuracy and feasibility. | |
| dc.identifier.citation | BMC Medical Education Vol.26 No.1 (2026) | |
| dc.identifier.doi | 10.1186/s12909-025-08204-5 | |
| dc.identifier.eissn | 14726920 | |
| dc.identifier.pmid | 41327246 | |
| dc.identifier.scopus | 2-s2.0-105027690329 | |
| dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/123456789/114725 | |
| dc.rights.holder | SCOPUS | |
| dc.subject | Medicine | |
| dc.subject | Social Sciences | |
| dc.title | Detection of flawed multiple-choice questions in preclinical medical education using item difficulty and discrimination indices: a six-year analysis | |
| dc.type | Article | |
| mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105027690329&origin=inward | |
| oaire.citation.issue | 1 | |
| oaire.citation.title | BMC Medical Education | |
| oaire.citation.volume | 26 | |
| oairecerif.author.affiliation | Siriraj Hospital |
