Optimizing Stroke Recognition with MediaPipe and Machine Learning: An Explainable AI Approach for Facial Landmark Analysis

Ul Karim R.; Mahdi S.; Samin A.; Zereen A.N.; Abdullah-Al-Wadud M.; Uddin J.

Optimizing Stroke Recognition with MediaPipe and Machine Learning: An Explainable AI Approach for Facial Landmark Analysis

Issued Date

2025-01-01

Resource Type

Article

eISSN

21693536

DOI

10.1109/ACCESS.2025.3550577

Scopus ID

2-s2.0-86000506261

Journal Title

IEEE Access

Rights Holder(s)

SCOPUS

Bibliographic Citation

IEEE Access (2025)

Suggested Citation

Ul Karim R., Mahdi S., Samin A., Zereen A.N., Abdullah-Al-Wadud M., Uddin J. Optimizing Stroke Recognition with MediaPipe and Machine Learning: An Explainable AI Approach for Facial Landmark Analysis. IEEE Access (2025). doi:10.1109/ACCESS.2025.3550577 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/106800

Title

Optimizing Stroke Recognition with MediaPipe and Machine Learning: An Explainable AI Approach for Facial Landmark Analysis

Author(s)

Ul Karim R.
Mahdi S.
Samin A.
Zereen A.N.
Abdullah-Al-Wadud M.
Uddin J.

Author's Affiliation

Woosong University
King Saud University
Mahidol University
BRAC University

Corresponding Author(s)

Ul Karim R.

Other Contributor(s)

Mahidol University

Abstract

Early detection of stroke is critical for improving survival rates and recovery. This study presents an innovative approach to stroke diagnosis through the analysis of facial landmarks combining MediaPipe's facial landmark detection with advanced machine learning models - Random Forest (RF), Extreme Gradient Boosting (XGB), and Categorical Boosting (CB). A comprehensive dataset of stroke and non-stroke facial images are curated, with MediaPipe extracting 228 facial landmarks from key regions affected by stroke-related asymmetry. Explainable AI (XAI) techniques are applied to enhance model interpretability, allowing for a deeper understanding of the most significant facial regions contributing to stroke prediction. The machine learning models are individually optimized through hyperparameter tuning, and further we propose a Multimodal Voting Classifier (MVC). By aggregating the predictions through majority voting or weighted averaging, the system reduces the likelihood of incorrect classifications that might arise from a single model's limitations. This results in a more stable and generalized model with improved diagnostic accuracy, as demonstrated by its 94.75% accuracy, than any individual model's performance. Feature importance analysis, facilitated by XAI, identified key facial regions - such as the eyes, cheeks, and lips - as critical indicators of stroke, significantly improving the model's diagnostic precision. Additionally, t-SNE visualization is employed to provide insight into classifying the stroke and non-stroke cases, reinforcing the model's robustness. Real-time metrics show the model's efficiency and adaptability across diverse hardware, enabling practical clinical integration. The proposed system offers a cost-effective, non-invasive diagnostic tool that can be implemented in real-time, potentially improving accessibility to stroke diagnosis in remote and resource-limited areas.

Keyword(s)

Materials Science
Computer Science
Engineering

URI

https://repository.li.mahidol.ac.th/handle/123456789/106800

Collections

Scopus 2025

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th