Teacher-to-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion Model
Issued Date
2024-01-01
Resource Type
ISSN
1062922X
Scopus ID
2-s2.0-85217844053
Journal Title
Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
Start Page
2882
End Page
2887
Rights Holder(s)
SCOPUS
Bibliographic Citation
Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics (2024) , 2882-2887
Suggested Citation
Singkul S., Yuenyong S., Wongpatikaseree K. Teacher-to-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion Model. Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics (2024) , 2882-2887. 2887. doi:10.1109/SMC54092.2024.10830986 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/105407
Title
Teacher-to-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion Model
Author(s)
Author's Affiliation
Corresponding Author(s)
Other Contributor(s)
Abstract
This paper introduces the Teacher-to-Teacher (T2T) framework, a novel approach in speech emotion recognition (SER) specifically tailored for the Thai language. Leveraging the dual expertise of the Wav2Vec and Wav2Vec2 models, the T2T framework utilizes unsupervised and self-supervised learning knowledges to effectively address the unique challenges posed by tonal languages. By integrating these two powerful models into a unified SER framework, T2T enhances its capability to process and interpret nuanced emotional cues in speech, achieving superior performance compared to traditional SER methods. Evaluated across three major datasets - ThaiSER, EMOLA, and MU - the framework demonstrates significant improvements in unweighted accuracy and F1-score. Innovations such as emotional clustering representation and targeted emotional representation contribute to its high precision in detecting and differentiating subtle emotional states. Additionally, the integration of a fine-tuned teacher module aligns these advancements with practical SER applications, further increasing the framework's accuracy and sensitivity in real-world scenarios. The successful implementation of the T2T framework opens new avenues for enhancing SER technologies in other low-resource languages and extends its applicability to real-time processing applications, thereby advancing the field of computational emotion recognition.