Singkul S.Yuenyong S.Wongpatikaseree K.Mahidol University2025-02-242025-02-242024-01-01Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics (2024) , 2882-28871062922Xhttps://repository.li.mahidol.ac.th/handle/20.500.14594/105407This paper introduces the Teacher-to-Teacher (T2T) framework, a novel approach in speech emotion recognition (SER) specifically tailored for the Thai language. Leveraging the dual expertise of the Wav2Vec and Wav2Vec2 models, the T2T framework utilizes unsupervised and self-supervised learning knowledges to effectively address the unique challenges posed by tonal languages. By integrating these two powerful models into a unified SER framework, T2T enhances its capability to process and interpret nuanced emotional cues in speech, achieving superior performance compared to traditional SER methods. Evaluated across three major datasets - ThaiSER, EMOLA, and MU - the framework demonstrates significant improvements in unweighted accuracy and F1-score. Innovations such as emotional clustering representation and targeted emotional representation contribute to its high precision in detecting and differentiating subtle emotional states. Additionally, the integration of a fine-tuned teacher module aligns these advancements with practical SER applications, further increasing the framework's accuracy and sensitivity in real-world scenarios. The successful implementation of the T2T framework opens new avenues for enhancing SER technologies in other low-resource languages and extends its applicability to real-time processing applications, thereby advancing the field of computational emotion recognition.Computer ScienceEngineeringTeacher-to-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion ModelConference PaperSCOPUS10.1109/SMC54092.2024.108309862-s2.0-85217844053