DPI_CDF: druggable protein identifier using cascade deep forest

Arif M.; Fang G.; Ghulam A.; Musleh S.; Alam T.

DPI_CDF: druggable protein identifier using cascade deep forest

Issued Date

2024-12-01

Resource Type

Article

eISSN

14712105

DOI

10.1186/s12859-024-05744-3

Scopus ID

2-s2.0-85189624805

Journal Title

BMC Bioinformatics

Volume

25

Issue

1

Rights Holder(s)

SCOPUS

Bibliographic Citation

BMC Bioinformatics Vol.25 No.1 (2024)

Suggested Citation

Arif M., Fang G., Ghulam A., Musleh S., Alam T. DPI_CDF: druggable protein identifier using cascade deep forest. BMC Bioinformatics Vol.25 No.1 (2024). doi:10.1186/s12859-024-05744-3 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/97955

Title

DPI_CDF: druggable protein identifier using cascade deep forest

Author(s)

Arif M.
Fang G.
Ghulam A.
Musleh S.
Alam T.

Author's Affiliation

Hamad Bin Khalifa University, College of Science and Engineering
Sindh Agriculture University
Mahidol University
P. R. China

Corresponding Author(s)

Arif M.

Other Contributor(s)

Mahidol University

Abstract

Background: Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor’s performance is still not satisfactory. Methods: In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. Results: The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew’s-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. Availability: The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF.

Keyword(s)

Mathematics
Biochemistry, Genetics and Molecular Biology
Computer Science

URI

https://repository.li.mahidol.ac.th/handle/123456789/97955

Collections

Scopus 2024

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th