Development and validation of supervised machine learning multivariable prediction models for the diagnosis of Pneumocystis jirovecii pneumonia using nasopharyngeal swab PCR in adults in a low-HIV prevalence setting

Chew R.; Woods M.L.; Paterson D.L.

Development and validation of supervised machine learning multivariable prediction models for the diagnosis of Pneumocystis jirovecii pneumonia using nasopharyngeal swab PCR in adults in a low-HIV prevalence setting

1

Issued Date

2025-09-01

Resource Type

Article

ISSN

18763413

eISSN

18763405

DOI

10.1093/inthealth/ihae052

Scopus ID

2-s2.0-105015526047

Pubmed ID

39206512

Journal Title

International Health

Volume

17

Issue

5

Start Page

804

End Page

808

Rights Holder(s)

SCOPUS

Bibliographic Citation

International Health Vol.17 No.5 (2025) , 804-808

Suggested Citation

Chew R., Woods M.L., Paterson D.L. Development and validation of supervised machine learning multivariable prediction models for the diagnosis of Pneumocystis jirovecii pneumonia using nasopharyngeal swab PCR in adults in a low-HIV prevalence setting. International Health Vol.17 No.5 (2025) , 804-808. 808. doi:10.1093/inthealth/ihae052 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/112088

Title

Development and validation of supervised machine learning multivariable prediction models for the diagnosis of Pneumocystis jirovecii pneumonia using nasopharyngeal swab PCR in adults in a low-HIV prevalence setting

Author(s)

Chew R.
Woods M.L.
Paterson D.L.

Author's Affiliation

National University of Singapore
Nuffield Department of Medicine
Royal Brisbane and Women's Hospital
Faculty of Medicine
Mahidol Oxford Tropical Medicine Research Unit

Corresponding Author(s)

Chew R.

Other Contributor(s)

Mahidol University

Abstract

Background: The global burden of the opportunistic fungal disease Pneumocystis jirovecii pneumonia (PJP) remains substantial. Polymerase chain reaction (PCR) on nasopharyngeal swabs (NPS) has high specificity and may be a viable alternative to the gold standard diagnostic of PCR on invasively collected lower respiratory tract specimens, but has low sensitivity. Sensitivity may be improved by incorporating NPS PCR results into machine learning models. Methods: Three supervised multivariable diagnostic models (random forest, logistic regression and extreme gradient boosting) were constructed and validated using a 111-person Australian dataset. The predictors were age, gender, immunosuppression type and NPS PCR result. Model performance metrics such as accuracy, sensitivity, specificity and predictive values were compared to select the best-performing model. Results: The logistic regression model performed best, with 80% accuracy, improving sensitivity to 86% and maintaining acceptable specificity of 70%. Using this model, positive and negative NPS PCR results indicated post-test probabilities of 84% (likely PJP) and 26% (unlikely PJP), respectively. Conclusions: The logistic regression model should be externally validated in a wider range of settings. As the predictors are simple, routinely collected patient variables, this model may represent a diagnostic advance suitable for settings where collection of lower respiratory tract specimens is difficult but PCR is available.

Keyword(s)

Medicine
Social Sciences

URI

https://repository.li.mahidol.ac.th/handle/123456789/112088

Collections

Scopus 2025

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th