GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features

dc.contributor.authorMalik A.
dc.contributor.authorShoombuatong W.
dc.contributor.authorKim C.B.
dc.contributor.authorManavalan B.
dc.contributor.otherMahidol University
dc.date.accessioned2023-05-19T07:35:39Z
dc.date.available2023-05-19T07:35:39Z
dc.date.issued2023-02-28
dc.description.abstractThe cell surface proteins of gram-positive bacteria are involved in many important biological functions, including the infection of host cells. Owing to their virulent nature, these proteins are also considered strong candidates for potential drug or vaccine targets. Among the various cell surface proteins of gram-positive bacteria, LPXTG-like proteins form a major class. These proteins have a highly conserved C-terminal cell wall sorting signal, which consists of an LPXTG sequence motif, a hydrophobic domain, and a positively charged tail. These surface proteins are targeted to the cell envelope by a sortase enzyme via transpeptidation. A variety of LPXTG-like proteins have been experimentally characterized; however, their number in public databases has increased owing to extensive bacterial genome sequencing without proper annotation. In the absence of experimental characterization, identifying and annotating these sequences is extremely challenging. Therefore, in this study, we developed the first machine learning-based predictor called GPApred, which can identify LPXTG-like proteins from their primary sequences. Using a newly constructed benchmark dataset, we explored different classifiers and five feature encodings and their hybrids. Optimal features were derived using the recursive feature elimination method, and these features were then trained using a support vector machine algorithm. The performance of different models was evaluated using independent datasets, and a final model (GPApred) was selected based on consistency during cross-validation and independent assessment. GPApred can be an effective tool for predicting LPXTG-like sequences and can be further employed for functional characterization or drug targeting. Availability: https://procarb.org/gpapred/.
dc.identifier.citationInternational Journal of Biological Macromolecules Vol.229 (2023) , 529-538
dc.identifier.doi10.1016/j.ijbiomac.2022.12.315
dc.identifier.eissn18790003
dc.identifier.issn01418130
dc.identifier.pmid36596370
dc.identifier.scopus2-s2.0-85145730211
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/81652
dc.rights.holderSCOPUS
dc.subjectBiochemistry, Genetics and Molecular Biology
dc.titleGPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features
dc.typeArticle
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85145730211&origin=inward
oaire.citation.endPage538
oaire.citation.startPage529
oaire.citation.titleInternational Journal of Biological Macromolecules
oaire.citation.volume229
oairecerif.author.affiliationSangmyung University
oairecerif.author.affiliationMahidol University
oairecerif.author.affiliationSungkyunkwan University

Files

Collections