DPI_CDF: druggable protein identifier using cascade deep forest
dc.contributor.author | Arif M. | |
dc.contributor.author | Fang G. | |
dc.contributor.author | Ghulam A. | |
dc.contributor.author | Musleh S. | |
dc.contributor.author | Alam T. | |
dc.contributor.correspondence | Arif M. | |
dc.contributor.other | Mahidol University | |
dc.date.accessioned | 2024-04-13T18:06:54Z | |
dc.date.available | 2024-04-13T18:06:54Z | |
dc.date.issued | 2024-12-01 | |
dc.description.abstract | Background: Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor’s performance is still not satisfactory. Methods: In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. Results: The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew’s-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. Availability: The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF. | |
dc.identifier.citation | BMC Bioinformatics Vol.25 No.1 (2024) | |
dc.identifier.doi | 10.1186/s12859-024-05744-3 | |
dc.identifier.eissn | 14712105 | |
dc.identifier.scopus | 2-s2.0-85189624805 | |
dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/123456789/97955 | |
dc.rights.holder | SCOPUS | |
dc.subject | Mathematics | |
dc.subject | Biochemistry, Genetics and Molecular Biology | |
dc.subject | Computer Science | |
dc.title | DPI_CDF: druggable protein identifier using cascade deep forest | |
dc.type | Article | |
mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85189624805&origin=inward | |
oaire.citation.issue | 1 | |
oaire.citation.title | BMC Bioinformatics | |
oaire.citation.volume | 25 | |
oairecerif.author.affiliation | Hamad Bin Khalifa University, College of Science and Engineering | |
oairecerif.author.affiliation | Sindh Agriculture University | |
oairecerif.author.affiliation | Mahidol University | |
oairecerif.author.affiliation | P. R. China |