Temporal Fusion of Convolutional and LSTM Networks for Vision-Based Fall Detection Using Anatomical Keypoints
| dc.contributor.author | Tasnim H. | |
| dc.contributor.author | Joy A.D. | |
| dc.contributor.author | Dutta A. | |
| dc.contributor.author | Rabbi R. | |
| dc.contributor.author | Zereen A.N. | |
| dc.contributor.correspondence | Tasnim H. | |
| dc.contributor.other | Mahidol University | |
| dc.date.accessioned | 2026-06-20T18:14:14Z | |
| dc.date.available | 2026-06-20T18:14:14Z | |
| dc.date.issued | 2025-01-01 | |
| dc.description.abstract | Fall remains a leading cause of injury and serious health consequences, particularly among elderly individuals. Traditional fall detection systems often rely on wearable devices equipped with sensors, which can be inconvenient. On the other hand, existing deep learning-based approaches mostly analyze image or video data directly and involve complex, resourceintensive architectures that are unsuitable for practical, resourceconstrained settings. To resolve these issues, this study proposes a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) architecture, leveraging both spatial and temporal dependencies of YOLOv8-extracted anatomical keypoints from sequential video frames of the Le2i fall detection dataset. Moreover, motion-based feature engineering and hyperparameter tuning are applied, enabling the model to achieve an accuracy of 98.48% using only 52 features, including 34 anatomical keypoints and 18 motion features (velocity, rolling mean, standard deviation). A comparative analysis with the baseline lstm, Recurrent Neural Network (RNN), and Gated Recurrent Unit (GRU) models is also conducted, demonstrating the superior performance of the proposed CNN-LSTM approach. Additionally, to enable practical usage of the model in resource-limited settings, a web interface is developed for real-time monitoring, alerts, and spacespecific filtering, allowing separate monitoring of personal areas while addressing privacy concerns. | |
| dc.identifier.citation | 2025 28th International Conference on Computer and Information Technology Iccit 2025 (2025) , 3246-3251 | |
| dc.identifier.doi | 10.1109/ICCIT68739.2025.11490414 | |
| dc.identifier.scopus | 2-s2.0-105041620016 | |
| dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/123456789/117419 | |
| dc.rights.holder | SCOPUS | |
| dc.subject | Computer Science | |
| dc.title | Temporal Fusion of Convolutional and LSTM Networks for Vision-Based Fall Detection Using Anatomical Keypoints | |
| dc.type | Conference Paper | |
| mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105041620016&origin=inward | |
| oaire.citation.endPage | 3251 | |
| oaire.citation.startPage | 3246 | |
| oaire.citation.title | 2025 28th International Conference on Computer and Information Technology Iccit 2025 | |
| oairecerif.author.affiliation | Mahidol University | |
| oairecerif.author.affiliation | BRAC University |
