StackPR is a new computational approach for large-scale identification of progesterone receptor antagonists using the stacking strategy
Issued Date
2022-12-01
Resource Type
eISSN
20452322
Scopus ID
2-s2.0-85139231644
Pubmed ID
36180453
Journal Title
Scientific Reports
Volume
12
Issue
1
Rights Holder(s)
SCOPUS
Bibliographic Citation
Scientific Reports Vol.12 No.1 (2022)
Suggested Citation
Schaduangrat N., Anuwongcharoen N., Moni M.A., Lio’ P., Charoenkwan P., Shoombuatong W. StackPR is a new computational approach for large-scale identification of progesterone receptor antagonists using the stacking strategy. Scientific Reports Vol.12 No.1 (2022). doi:10.1038/s41598-022-20143-5 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/87717
Title
StackPR is a new computational approach for large-scale identification of progesterone receptor antagonists using the stacking strategy
Other Contributor(s)
Abstract
Progesterone receptors (PRs) are implicated in various cancers since their presence/absence can determine clinical outcomes. The overstimulation of progesterone can facilitate oncogenesis and thus, its modulation through PR inhibition is urgently needed. To address this issue, a novel stacked ensemble learning approach (termed StackPR) is presented for fast, accurate, and large-scale identification of PR antagonists using only SMILES notation without the need for 3D structural information. We employed six popular machine learning (ML) algorithms (i.e., logistic regression, partial least squares, k-nearest neighbor, support vector machine, extremely randomized trees, and random forest) coupled with twelve conventional molecular descriptors to create 72 baseline models. Then, a genetic algorithm in conjunction with the self-assessment-report approach was utilized to determine m out of the 72 baseline models as means of developing the final meta-predictor using the stacking strategy and tenfold cross-validation test. Experimental results on the independent test dataset show that StackPR achieved impressive predictive performance with an accuracy of 0.966 and Matthew’s coefficient correlation of 0.925. In addition, analysis based on the SHapley Additive exPlanation algorithm and molecular docking indicates that aliphatic hydrocarbons and nitrogen-containing substructures were the most important features for having PR antagonist activity. Finally, we implemented an online webserver using StackPR, which is freely accessible at http://pmlabstack.pythonanywhere.com/StackPR. StackPR is anticipated to be a powerful computational tool for the large-scale identification of unknown PR antagonist candidates for follow-up experimental validation.