Enhancing agreement in cardiotocography interpretation between midwives and obstetricians through a rule-based AI program: A comparative cross-sectional study

dc.contributor.authorKaewsrinual S.
dc.contributor.authorHomdee N.
dc.contributor.authorRekhawasin Pinnington T.
dc.contributor.authorSurasereewong S.
dc.contributor.authorChanprapaph P.
dc.contributor.correspondenceKaewsrinual S.
dc.contributor.otherMahidol University
dc.date.accessioned2026-03-03T18:23:12Z
dc.date.available2026-03-03T18:23:12Z
dc.date.issued2026-01-01
dc.description.abstractObjective: To evaluate whether a rule-based artificial intelligence (AI) program can enhance interrater agreement in cardiotocography (CTG) interpretation between nurse–midwives and obstetricians. Methods: CTG data from 50 singleton pregnancies at ≥32 weeks of gestation were used to develop a rule-based AI program based on National Institute of Child Health and Human Development (NICHD) 2008 guidelines, with content validity confirmed (item–objective congruence = 0.85). A 22-item CTG test representing NICHD categories I to III was generated using a local obstetrician consensus reference standard, defined as ≥70% agreement among seven obstetricians. Twenty nurse–midwives interpreted the same CTG tracings twice, before and after AI support, with a 1- to 2-month interval, while obstetricians completed the test once to establish the reference standard. Interrater agreement was evaluated relative to this local expert consensus, rather than neonatal outcomes or an external gold standard, using quadratic weighted kappa for ordinal data and intraclass correlation coefficients (ICC) for continuous data. Interpretation time and user satisfaction were also assessed. Results: AI support significantly improved agreement across all ordinal parameters. Agreement on NICHD category interpretation increased from moderate (κ = 0.548) to almost perfect (κ = 0.906, P < 0.001). Improvements were also observed for baseline variability (κ = 0.459 to 0.853), fetal heart rate category (κ = 0.669 to 0.868), prolonged decelerations (κ = 0.719 to 0.963), and acceleration count (κ = 0.482 to 0.723). ICCs for variable and late decelerations improved from poor to good (0.328 to 0.725 and 0.304 to 0.676, respectively), whereas early decelerations remained low. Interpretation time decreased by a mean of 6.7 min with AI support (P < 0.001). Most midwives reported high satisfaction, with 70% strongly agreeing on its clinical utility. Conclusion: This exploratory study suggests that a rule-based AI program was associated with improved interrater agreement between midwives and obstetricians and reduced CTG interpretation time, with high user satisfaction. These preliminary findings warrant confirmation in larger studies to assess generalizability and to determine whether improved agreement translates into better perinatal outcomes, which were not assessed in this study.
dc.identifier.citationInternational Journal of Gynecology and Obstetrics (2026)
dc.identifier.doi10.1002/ijgo.70868
dc.identifier.eissn18793479
dc.identifier.issn00207292
dc.identifier.pmid41732906
dc.identifier.scopus2-s2.0-105031065856
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/115509
dc.rights.holderSCOPUS
dc.subjectMedicine
dc.titleEnhancing agreement in cardiotocography interpretation between midwives and obstetricians through a rule-based AI program: A comparative cross-sectional study
dc.typeArticle
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105031065856&origin=inward
oaire.citation.titleInternational Journal of Gynecology and Obstetrics
oairecerif.author.affiliationMahidol University
oairecerif.author.affiliationSiriraj Hospital

Files

Collections