Temporal Fusion of Convolutional and LSTM Networks for Vision-Based Fall Detection Using Anatomical Keypoints

Tasnim H.; Joy A.D.; Dutta A.; Rabbi R.; Zereen A.N.

Temporal Fusion of Convolutional and LSTM Networks for Vision-Based Fall Detection Using Anatomical Keypoints

dc.contributor.author	Tasnim H.
dc.contributor.author	Joy A.D.
dc.contributor.author	Dutta A.
dc.contributor.author	Rabbi R.
dc.contributor.author	Zereen A.N.
dc.contributor.correspondence	Tasnim H.
dc.contributor.other	Mahidol University
dc.date.accessioned	2026-06-20T18:14:14Z
dc.date.available	2026-06-20T18:14:14Z
dc.date.issued	2025-01-01
dc.description.abstract	Fall remains a leading cause of injury and serious health consequences, particularly among elderly individuals. Traditional fall detection systems often rely on wearable devices equipped with sensors, which can be inconvenient. On the other hand, existing deep learning-based approaches mostly analyze image or video data directly and involve complex, resourceintensive architectures that are unsuitable for practical, resourceconstrained settings. To resolve these issues, this study proposes a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) architecture, leveraging both spatial and temporal dependencies of YOLOv8-extracted anatomical keypoints from sequential video frames of the Le2i fall detection dataset. Moreover, motion-based feature engineering and hyperparameter tuning are applied, enabling the model to achieve an accuracy of 98.48% using only 52 features, including 34 anatomical keypoints and 18 motion features (velocity, rolling mean, standard deviation). A comparative analysis with the baseline lstm, Recurrent Neural Network (RNN), and Gated Recurrent Unit (GRU) models is also conducted, demonstrating the superior performance of the proposed CNN-LSTM approach. Additionally, to enable practical usage of the model in resource-limited settings, a web interface is developed for real-time monitoring, alerts, and spacespecific filtering, allowing separate monitoring of personal areas while addressing privacy concerns.
dc.identifier.citation	2025 28th International Conference on Computer and Information Technology Iccit 2025 (2025) , 3246-3251
dc.identifier.doi	10.1109/ICCIT68739.2025.11490414
dc.identifier.scopus	2-s2.0-105041620016
dc.identifier.uri	https://repository.li.mahidol.ac.th/handle/123456789/117419
dc.rights.holder	SCOPUS
dc.subject	Computer Science
dc.title	Temporal Fusion of Convolutional and LSTM Networks for Vision-Based Fall Detection Using Anatomical Keypoints
dc.type	Conference Paper
mu.datasource.scopus	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105041620016&origin=inward
oaire.citation.endPage	3251
oaire.citation.startPage	3246
oaire.citation.title	2025 28th International Conference on Computer and Information Technology Iccit 2025
oairecerif.author.affiliation	Mahidol University
oairecerif.author.affiliation	BRAC University

Collections

Scopus 2025

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th

Temporal Fusion of Convolutional and LSTM Networks for Vision-Based Fall Detection Using Anatomical Keypoints

Files

Collections