Optimizing Stroke Recognition with MediaPipe and Machine Learning: An Explainable AI Approach for Facial Landmark Analysis
dc.contributor.author | Ul Karim R. | |
dc.contributor.author | Mahdi S. | |
dc.contributor.author | Samin A. | |
dc.contributor.author | Zereen A.N. | |
dc.contributor.author | Abdullah-Al-Wadud M. | |
dc.contributor.author | Uddin J. | |
dc.contributor.correspondence | Ul Karim R. | |
dc.contributor.other | Mahidol University | |
dc.date.accessioned | 2025-03-24T18:21:57Z | |
dc.date.available | 2025-03-24T18:21:57Z | |
dc.date.issued | 2025-01-01 | |
dc.description.abstract | Early detection of stroke is critical for improving survival rates and recovery. This study presents an innovative approach to stroke diagnosis through the analysis of facial landmarks combining MediaPipe's facial landmark detection with advanced machine learning models - Random Forest (RF), Extreme Gradient Boosting (XGB), and Categorical Boosting (CB). A comprehensive dataset of stroke and non-stroke facial images are curated, with MediaPipe extracting 228 facial landmarks from key regions affected by stroke-related asymmetry. Explainable AI (XAI) techniques are applied to enhance model interpretability, allowing for a deeper understanding of the most significant facial regions contributing to stroke prediction. The machine learning models are individually optimized through hyperparameter tuning, and further we propose a Multimodal Voting Classifier (MVC). By aggregating the predictions through majority voting or weighted averaging, the system reduces the likelihood of incorrect classifications that might arise from a single model's limitations. This results in a more stable and generalized model with improved diagnostic accuracy, as demonstrated by its 94.75% accuracy, than any individual model's performance. Feature importance analysis, facilitated by XAI, identified key facial regions - such as the eyes, cheeks, and lips - as critical indicators of stroke, significantly improving the model's diagnostic precision. Additionally, t-SNE visualization is employed to provide insight into classifying the stroke and non-stroke cases, reinforcing the model's robustness. Real-time metrics show the model's efficiency and adaptability across diverse hardware, enabling practical clinical integration. The proposed system offers a cost-effective, non-invasive diagnostic tool that can be implemented in real-time, potentially improving accessibility to stroke diagnosis in remote and resource-limited areas. | |
dc.identifier.citation | IEEE Access (2025) | |
dc.identifier.doi | 10.1109/ACCESS.2025.3550577 | |
dc.identifier.eissn | 21693536 | |
dc.identifier.scopus | 2-s2.0-86000506261 | |
dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/20.500.14594/106800 | |
dc.rights.holder | SCOPUS | |
dc.subject | Materials Science | |
dc.subject | Computer Science | |
dc.subject | Engineering | |
dc.title | Optimizing Stroke Recognition with MediaPipe and Machine Learning: An Explainable AI Approach for Facial Landmark Analysis | |
dc.type | Article | |
mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=86000506261&origin=inward | |
oaire.citation.title | IEEE Access | |
oairecerif.author.affiliation | Woosong University | |
oairecerif.author.affiliation | King Saud University | |
oairecerif.author.affiliation | Mahidol University | |
oairecerif.author.affiliation | BRAC University |