Medical application of deep-learning-based head pose estimation from RGB image sequence

dc.contributor.authorChotikkakamthorn K.
dc.contributor.authorLie W.N.
dc.contributor.authorRitthipravat P.
dc.contributor.authorKusakunniran W.
dc.contributor.authorTuakta P.
dc.contributor.authorBenjapornlert P.
dc.contributor.correspondenceChotikkakamthorn K.
dc.contributor.otherMahidol University
dc.date.accessioned2025-06-28T18:25:48Z
dc.date.available2025-06-28T18:25:48Z
dc.date.issued2025-09-01
dc.description.abstractRecently, telemedicine has allowed doctor-to-patient or doctor-to-doctor consultations to tackle traditional problems: the COVID-19 pandemic, remote areas, long-time usage per visit, and dependence on family members in transportation. Nevertheless, few studies have applied telemedicine to measure head movement, which is mandatory for activities of daily living and is degraded by aging, trauma, pain, and degenerative disease. In recent years, artificial intelligence, including vision-based methods, has been used to measure cervical range of motion (CROM). However, they suffer from significant measurement errors and depth-camera requirements. Conversely, recent deep-learning-based head pose estimation (HPE) networks have achieved higher accuracy than previous methods, which are attractive for CROM measurements in telemedicine. This study aims to propose the application of a deep neural network adopting multi-level pyramidal feature extraction, a bi-directional Pyramidal Feature Aggregation Structure (PFAS) for feature fusion, a modified Atrous Spatial Pyramid Pooling (ASPP) module for spatial and channel feature enhancement, and a multi-bin classification and regression module, to derive the Euler angles as the head pose parameters. We evaluated the proposed technique on public datasets (300 W_LP, AFLW2000, and BIWI), achieving comparable performance to previous algorithms with mean MAE (mean absolute error) values of 3.36°, 3.50°, and 2.16° at several evaluation protocols. For CROM measurement in telemedicine, ours achieved the lowest mean MAE of 3.73° for a private medical dataset. Furthermore, ours achieved fast inference speed of 2.27 ms per image. Thus, for both traditional HPE problems and CROM measurement applications, ours offers accuracy, convenience, low computational requirements, and low operational costs (GitHub: https://github.com/nickuntitled/pyramid_based_HPE).
dc.identifier.citationComputers in Biology and Medicine Vol.195 (2025)
dc.identifier.doi10.1016/j.compbiomed.2025.110620
dc.identifier.eissn18790534
dc.identifier.issn00104825
dc.identifier.scopus2-s2.0-105008441429
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/110931
dc.rights.holderSCOPUS
dc.subjectComputer Science
dc.subjectMedicine
dc.titleMedical application of deep-learning-based head pose estimation from RGB image sequence
dc.typeArticle
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105008441429&origin=inward
oaire.citation.titleComputers in Biology and Medicine
oaire.citation.volume195
oairecerif.author.affiliationMahidol University
oairecerif.author.affiliationNational Chung Cheng University
oairecerif.author.affiliationRamathibodi Hospital

Files

Collections