The IMPACT framework for evaluating generative AI in critical care: development and multinational consensus validation

Yeh Y.C.; Shih M.C.; De Backer D.; Celi L.A.; See K.C.; Fujii T.; Ling L.; Mongkolpun W.; Hu H.W.; Chen H.Y.; Chen W.C.; Cholley B.; Fong K.K.; Ryu H.G.; Na S.; Egi M.; Chan W.S.; Chen K.F.; Kamaleswaran R.; Chuang Y.C.; Yang C.J.; Hsiao W.L.; Lai S.R.; Ku D.; Jahan A.; Martin G.S.

The IMPACT framework for evaluating generative AI in critical care: development and multinational consensus validation

2

Issued Date

2026-01-01

Resource Type

Article

eISSN

21105820

DOI

10.1016/j.aicoj.2026.100078

Scopus ID

2-s2.0-105040547932

Journal Title

Annals of Intensive Care

Volume

16

Rights Holder(s)

SCOPUS

Bibliographic Citation

Annals of Intensive Care Vol.16 (2026)

Suggested Citation

Yeh Y.C., Shih M.C., De Backer D., Celi L.A., See K.C., Fujii T., Ling L., Mongkolpun W., Hu H.W., Chen H.Y., Chen W.C., Cholley B., Fong K.K., Ryu H.G., Na S., Egi M., Chan W.S., Chen K.F., Kamaleswaran R., Chuang Y.C., Yang C.J., Hsiao W.L., Lai S.R., Ku D., Jahan A., Martin G.S. The IMPACT framework for evaluating generative AI in critical care: development and multinational consensus validation. Annals of Intensive Care Vol.16 (2026). doi:10.1016/j.aicoj.2026.100078 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/117141

Title

The IMPACT framework for evaluating generative AI in critical care: development and multinational consensus validation

Corresponding Author(s)

Yeh Y.C.

Other Contributor(s)

Mahidol University

Abstract

Background Generative artificial intelligence (GenAI) is increasingly used for clinical decision support in critical care, yet standardized methods for evaluating GenAI content in intensive care settings are lacking. Existing metrics assess textual similarity but fail to capture clinical accuracy, reasoning quality, or urgency. Methods We developed and validated the IMPACT framework through a five-phase multinational panel consensus process. Reporting adhered to the ACCORD guideline. A steering committee of eight persons provided clinical and methodological oversight. Panelists were recruited through purposive sampling to ensure geographic and multidisciplinary representation. Content validity was assessed using the Content Validity Ratio (CVR) and Item-level Content Validity Index (I-CVI), with retention thresholds set at 70% agreement and I-CVI ≥0.80. Results A total of 58 panelists from 12 countries and regions participated, with 42 completing formal consensus voting. Participants included intensivists, physicians with AI research expertise, information technology specialists, and other critical care professionals. All six IMPACT domains exceeded validity thresholds (mean agreement 89.3%, CVR = 0.79, I-CVI = 0.92). Of 24 candidate subitems, 21 met retention criteria (mean agreement 85.7%, CVR = 0.71, I-CVI = 0.90). Three subitems were removed due to insufficient consensus and conceptual overlap. The validated framework comprises six domains with 21 subitems. Conclusions The IMPACT framework provides a consensus-validated approach for evaluating GenAI clinical decision support in intensive care, addressing gaps in current evaluation methods.

Keyword(s)

Medicine

URI

https://repository.li.mahidol.ac.th/handle/123456789/117141

Collections

Scopus 2026

Full item page

Send Feedback