MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction

Ge F.; Arif M.; Yan Z.; Alahmadi H.; Worachartcheewan A.; Yu D.J.; Shoombuatong W.

MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction

dc.contributor.author	Ge F.
dc.contributor.author	Arif M.
dc.contributor.author	Yan Z.
dc.contributor.author	Alahmadi H.
dc.contributor.author	Worachartcheewan A.
dc.contributor.author	Yu D.J.
dc.contributor.author	Shoombuatong W.
dc.contributor.other	Mahidol University
dc.date.accessioned	2023-12-09T18:01:18Z
dc.date.available	2023-12-09T18:01:18Z
dc.date.issued	2023-01-01
dc.description.abstract	Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals’ outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites’ ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals’ outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity. Our Web server, available at http://csbio.njust.edu.cn/bioinf/mmpatho/, allows researchers to predict the pathogenicity (alongside the reliability index score) of MMs using the ConsMM and EvoIndMM models and provides extensive annotations for user input. Additionally, the newly constructed benchmark data set and blind test set can be accessed via the data page of our web server.
dc.identifier.citation	Journal of Chemical Information and Modeling (2023)
dc.identifier.doi	10.1021/acs.jcim.3c00950
dc.identifier.eissn	1549960X
dc.identifier.issn	15499596
dc.identifier.scopus	2-s2.0-85178135519
dc.identifier.uri	https://repository.li.mahidol.ac.th/handle/20.500.14594/91343
dc.rights.holder	SCOPUS
dc.subject	Chemical Engineering
dc.title	MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction
dc.type	Article
mu.datasource.scopus	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85178135519&origin=inward
oaire.citation.title	Journal of Chemical Information and Modeling
oairecerif.author.affiliation	Hamad Bin Khalifa University, College of Science and Engineering
oairecerif.author.affiliation	Mahidol University
oairecerif.author.affiliation	Nanjing University of Science and Technology
oairecerif.author.affiliation	Nanjing University of Post and TeleCommunications
oairecerif.author.affiliation	Taibah University

Collections

Scopus 2023

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th

MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction

Files

Collections