Temporal Fusion of Convolutional and LSTM Networks for Vision-Based Fall Detection Using Anatomical Keypoints

Tasnim H.; Joy A.D.; Dutta A.; Rabbi R.; Zereen A.N.

Temporal Fusion of Convolutional and LSTM Networks for Vision-Based Fall Detection Using Anatomical Keypoints

Issued Date

2025-01-01

Resource Type

Conference Paper

DOI

10.1109/ICCIT68739.2025.11490414

Scopus ID

2-s2.0-105041620016

Journal Title

2025 28th International Conference on Computer and Information Technology Iccit 2025

Start Page

3246

End Page

3251

Rights Holder(s)

SCOPUS

Bibliographic Citation

2025 28th International Conference on Computer and Information Technology Iccit 2025 (2025) , 3246-3251

Suggested Citation

Tasnim H., Joy A.D., Dutta A., Rabbi R., Zereen A.N. Temporal Fusion of Convolutional and LSTM Networks for Vision-Based Fall Detection Using Anatomical Keypoints. 2025 28th International Conference on Computer and Information Technology Iccit 2025 (2025) , 3246-3251. 3251. doi:10.1109/ICCIT68739.2025.11490414 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/117419

Title

Temporal Fusion of Convolutional and LSTM Networks for Vision-Based Fall Detection Using Anatomical Keypoints

Author(s)

Tasnim H.
Joy A.D.
Dutta A.
Rabbi R.
Zereen A.N.

Author's Affiliation

Mahidol University
BRAC University

Corresponding Author(s)

Tasnim H.

Other Contributor(s)

Mahidol University

Abstract

Fall remains a leading cause of injury and serious health consequences, particularly among elderly individuals. Traditional fall detection systems often rely on wearable devices equipped with sensors, which can be inconvenient. On the other hand, existing deep learning-based approaches mostly analyze image or video data directly and involve complex, resourceintensive architectures that are unsuitable for practical, resourceconstrained settings. To resolve these issues, this study proposes a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) architecture, leveraging both spatial and temporal dependencies of YOLOv8-extracted anatomical keypoints from sequential video frames of the Le2i fall detection dataset. Moreover, motion-based feature engineering and hyperparameter tuning are applied, enabling the model to achieve an accuracy of 98.48% using only 52 features, including 34 anatomical keypoints and 18 motion features (velocity, rolling mean, standard deviation). A comparative analysis with the baseline lstm, Recurrent Neural Network (RNN), and Gated Recurrent Unit (GRU) models is also conducted, demonstrating the superior performance of the proposed CNN-LSTM approach. Additionally, to enable practical usage of the model in resource-limited settings, a web interface is developed for real-time monitoring, alerts, and spacespecific filtering, allowing separate monitoring of personal areas while addressing privacy concerns.

Keyword(s)

Computer Science

URI

https://repository.li.mahidol.ac.th/handle/123456789/117419

Collections

Scopus 2025

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th