DPI_CDF: druggable protein identifier using cascade deep forest

Arif M.; Fang G.; Ghulam A.; Musleh S.; Alam T.

DPI_CDF: druggable protein identifier using cascade deep forest

dc.contributor.author	Arif M.
dc.contributor.author	Fang G.
dc.contributor.author	Ghulam A.
dc.contributor.author	Musleh S.
dc.contributor.author	Alam T.
dc.contributor.correspondence	Arif M.
dc.contributor.other	Mahidol University
dc.date.accessioned	2024-04-13T18:06:54Z
dc.date.available	2024-04-13T18:06:54Z
dc.date.issued	2024-12-01
dc.description.abstract	Background: Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor’s performance is still not satisfactory. Methods: In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. Results: The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew’s-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. Availability: The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF.
dc.identifier.citation	BMC Bioinformatics Vol.25 No.1 (2024)
dc.identifier.doi	10.1186/s12859-024-05744-3
dc.identifier.eissn	14712105
dc.identifier.scopus	2-s2.0-85189624805
dc.identifier.uri	https://repository.li.mahidol.ac.th/handle/123456789/97955
dc.rights.holder	SCOPUS
dc.subject	Mathematics
dc.subject	Biochemistry, Genetics and Molecular Biology
dc.subject	Computer Science
dc.title	DPI_CDF: druggable protein identifier using cascade deep forest
dc.type	Article
mu.datasource.scopus	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85189624805&origin=inward
oaire.citation.issue	1
oaire.citation.title	BMC Bioinformatics
oaire.citation.volume	25
oairecerif.author.affiliation	Hamad Bin Khalifa University, College of Science and Engineering
oairecerif.author.affiliation	Sindh Agriculture University
oairecerif.author.affiliation	Mahidol University
oairecerif.author.affiliation	P. R. China

Collections

Scopus 2024

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th

DPI_CDF: druggable protein identifier using cascade deep forest

Files

Collections