Teacher-to-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion Model
dc.contributor.author | Singkul S. | |
dc.contributor.author | Yuenyong S. | |
dc.contributor.author | Wongpatikaseree K. | |
dc.contributor.correspondence | Singkul S. | |
dc.contributor.other | Mahidol University | |
dc.date.accessioned | 2025-02-24T18:20:51Z | |
dc.date.available | 2025-02-24T18:20:51Z | |
dc.date.issued | 2024-01-01 | |
dc.description.abstract | This paper introduces the Teacher-to-Teacher (T2T) framework, a novel approach in speech emotion recognition (SER) specifically tailored for the Thai language. Leveraging the dual expertise of the Wav2Vec and Wav2Vec2 models, the T2T framework utilizes unsupervised and self-supervised learning knowledges to effectively address the unique challenges posed by tonal languages. By integrating these two powerful models into a unified SER framework, T2T enhances its capability to process and interpret nuanced emotional cues in speech, achieving superior performance compared to traditional SER methods. Evaluated across three major datasets - ThaiSER, EMOLA, and MU - the framework demonstrates significant improvements in unweighted accuracy and F1-score. Innovations such as emotional clustering representation and targeted emotional representation contribute to its high precision in detecting and differentiating subtle emotional states. Additionally, the integration of a fine-tuned teacher module aligns these advancements with practical SER applications, further increasing the framework's accuracy and sensitivity in real-world scenarios. The successful implementation of the T2T framework opens new avenues for enhancing SER technologies in other low-resource languages and extends its applicability to real-time processing applications, thereby advancing the field of computational emotion recognition. | |
dc.identifier.citation | Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics (2024) , 2882-2887 | |
dc.identifier.doi | 10.1109/SMC54092.2024.10830986 | |
dc.identifier.issn | 1062922X | |
dc.identifier.scopus | 2-s2.0-85217844053 | |
dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/20.500.14594/105407 | |
dc.rights.holder | SCOPUS | |
dc.subject | Computer Science | |
dc.subject | Engineering | |
dc.title | Teacher-to-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion Model | |
dc.type | Conference Paper | |
mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85217844053&origin=inward | |
oaire.citation.endPage | 2887 | |
oaire.citation.startPage | 2882 | |
oaire.citation.title | Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics | |
oairecerif.author.affiliation | Mahidol University | |
oairecerif.author.affiliation | Ltd |