Development and validation of supervised machine learning multivariable prediction models for the diagnosis of Pneumocystis jirovecii pneumonia using nasopharyngeal swab PCR in adults in a low-HIV prevalence setting
1
Issued Date
2025-09-01
Resource Type
ISSN
18763413
eISSN
18763405
Scopus ID
2-s2.0-105015526047
Pubmed ID
39206512
Journal Title
International Health
Volume
17
Issue
5
Start Page
804
End Page
808
Rights Holder(s)
SCOPUS
Bibliographic Citation
International Health Vol.17 No.5 (2025) , 804-808
Suggested Citation
Chew R., Woods M.L., Paterson D.L. Development and validation of supervised machine learning multivariable prediction models for the diagnosis of Pneumocystis jirovecii pneumonia using nasopharyngeal swab PCR in adults in a low-HIV prevalence setting. International Health Vol.17 No.5 (2025) , 804-808. 808. doi:10.1093/inthealth/ihae052 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/112088
Title
Development and validation of supervised machine learning multivariable prediction models for the diagnosis of Pneumocystis jirovecii pneumonia using nasopharyngeal swab PCR in adults in a low-HIV prevalence setting
Author(s)
Corresponding Author(s)
Other Contributor(s)
Abstract
Background: The global burden of the opportunistic fungal disease Pneumocystis jirovecii pneumonia (PJP) remains substantial. Polymerase chain reaction (PCR) on nasopharyngeal swabs (NPS) has high specificity and may be a viable alternative to the gold standard diagnostic of PCR on invasively collected lower respiratory tract specimens, but has low sensitivity. Sensitivity may be improved by incorporating NPS PCR results into machine learning models. Methods: Three supervised multivariable diagnostic models (random forest, logistic regression and extreme gradient boosting) were constructed and validated using a 111-person Australian dataset. The predictors were age, gender, immunosuppression type and NPS PCR result. Model performance metrics such as accuracy, sensitivity, specificity and predictive values were compared to select the best-performing model. Results: The logistic regression model performed best, with 80% accuracy, improving sensitivity to 86% and maintaining acceptable specificity of 70%. Using this model, positive and negative NPS PCR results indicated post-test probabilities of 84% (likely PJP) and 26% (unlikely PJP), respectively. Conclusions: The logistic regression model should be externally validated in a wider range of settings. As the predictors are simple, routinely collected patient variables, this model may represent a diagnostic advance suitable for settings where collection of lower respiratory tract specimens is difficult but PCR is available.
