Medical application of deep-learning-based head pose estimation from RGB image sequence
Issued Date
2025-09-01
Resource Type
ISSN
00104825
eISSN
18790534
Scopus ID
2-s2.0-105008441429
Journal Title
Computers in Biology and Medicine
Volume
195
Rights Holder(s)
SCOPUS
Bibliographic Citation
Computers in Biology and Medicine Vol.195 (2025)
Suggested Citation
Chotikkakamthorn K., Lie W.N., Ritthipravat P., Kusakunniran W., Tuakta P., Benjapornlert P. Medical application of deep-learning-based head pose estimation from RGB image sequence. Computers in Biology and Medicine Vol.195 (2025). doi:10.1016/j.compbiomed.2025.110620 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/110931
Title
Medical application of deep-learning-based head pose estimation from RGB image sequence
Author's Affiliation
Corresponding Author(s)
Other Contributor(s)
Abstract
Recently, telemedicine has allowed doctor-to-patient or doctor-to-doctor consultations to tackle traditional problems: the COVID-19 pandemic, remote areas, long-time usage per visit, and dependence on family members in transportation. Nevertheless, few studies have applied telemedicine to measure head movement, which is mandatory for activities of daily living and is degraded by aging, trauma, pain, and degenerative disease. In recent years, artificial intelligence, including vision-based methods, has been used to measure cervical range of motion (CROM). However, they suffer from significant measurement errors and depth-camera requirements. Conversely, recent deep-learning-based head pose estimation (HPE) networks have achieved higher accuracy than previous methods, which are attractive for CROM measurements in telemedicine. This study aims to propose the application of a deep neural network adopting multi-level pyramidal feature extraction, a bi-directional Pyramidal Feature Aggregation Structure (PFAS) for feature fusion, a modified Atrous Spatial Pyramid Pooling (ASPP) module for spatial and channel feature enhancement, and a multi-bin classification and regression module, to derive the Euler angles as the head pose parameters. We evaluated the proposed technique on public datasets (300 W_LP, AFLW2000, and BIWI), achieving comparable performance to previous algorithms with mean MAE (mean absolute error) values of 3.36°, 3.50°, and 2.16° at several evaluation protocols. For CROM measurement in telemedicine, ours achieved the lowest mean MAE of 3.73° for a private medical dataset. Furthermore, ours achieved fast inference speed of 2.27 ms per image. Thus, for both traditional HPE problems and CROM measurement applications, ours offers accuracy, convenience, low computational requirements, and low operational costs (GitHub: https://github.com/nickuntitled/pyramid_based_HPE).
