Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework

dc.contributor.authorCharoenkwan P.
dc.contributor.authorChumnanpuen P.
dc.contributor.authorSchaduangrat N.
dc.contributor.authorShoombuatong W.
dc.contributor.correspondenceCharoenkwan P.
dc.contributor.otherMahidol University
dc.date.accessioned2024-03-13T18:09:01Z
dc.date.available2024-03-13T18:09:01Z
dc.date.issued2024-01-01
dc.description.abstractABSTRACT: Plant-allergenic proteins (PAPs) have the potential to induce allergic reactions in certain individuals. While these proteins are generally innocuous for the majority of people, they can elicit an immune response in those with particular sensitivities. Thus, screening and prioritizing the allergenic potential of plant proteins is indispensable for the development of diagnostic tools, therapeutic interventions or medications to treat allergic reactions. However, investigating the allergenic potential of plant proteins based on experimental methods is costly and labour-intensive. Therefore, we develop StackPAP, a three-layer stacking ensemble framework for accurate large-scale identification of PAPs. In StackPAP, at the first layer, we conducted a comprehensive analysis of an extensive set of feature descriptors. Subsequently, we selected and fused five potential sequence-based feature descriptors, including amphiphilic pseudo-amino acid composition, dipeptide deviation from expected mean, amino acid composition, pseudo amino acid composition and dipeptide composition. Additionally, we applied an efficient genetic algorithm (GA-SAR) to determine informative feature sets. In the second layer, 12 powerful machine learning (ML) methods, in combination with all the informative feature sets, were employed to construct a pool of base classifiers. Finally, 13 potential base classifiers were selected using the GA-SAR method and combined to develop the final meta-classifier. Our experimental results revealed the promising prediction performance of StackPAP, with an accuracy, Matthew’s correlation coefficient and AUC of 0.984, 0.969 and 0.993, respectively, as judged by the independent test dataset. In conclusion, both cross-validation and independent test results indicated the superior performance of StackPAP compared with several ML-based classifiers. To accelerate the identification of the allergenicity of plant proteins, we developed a user-friendly web server for StackPAP (https://pmlabqsar.pythonanywhere.com/StackPAP). We anticipate that StackPAP will be an efficient and useful tool for rapidly screening PAPs from a vast number of plant proteins. Communicated by Ramaswamy H. Sarma.
dc.identifier.citationJournal of Biomolecular Structure and Dynamics (2024)
dc.identifier.doi10.1080/07391102.2024.2318482
dc.identifier.eissn15380254
dc.identifier.issn07391102
dc.identifier.pmid38385478
dc.identifier.scopus2-s2.0-85186400918
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/97534
dc.rights.holderSCOPUS
dc.subjectBiochemistry, Genetics and Molecular Biology
dc.titleAccelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework
dc.typeArticle
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85186400918&origin=inward
oaire.citation.titleJournal of Biomolecular Structure and Dynamics
oairecerif.author.affiliationKasetsart University
oairecerif.author.affiliationMahidol University
oairecerif.author.affiliationChiang Mai University

Files

Collections