Teacher-to-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion Model

Singkul S.; Yuenyong S.; Wongpatikaseree K.

Teacher-to-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion Model

dc.contributor.author	Singkul S.
dc.contributor.author	Yuenyong S.
dc.contributor.author	Wongpatikaseree K.
dc.contributor.correspondence	Singkul S.
dc.contributor.other	Mahidol University
dc.date.accessioned	2025-02-24T18:20:51Z
dc.date.available	2025-02-24T18:20:51Z
dc.date.issued	2024-01-01
dc.description.abstract	This paper introduces the Teacher-to-Teacher (T2T) framework, a novel approach in speech emotion recognition (SER) specifically tailored for the Thai language. Leveraging the dual expertise of the Wav2Vec and Wav2Vec2 models, the T2T framework utilizes unsupervised and self-supervised learning knowledges to effectively address the unique challenges posed by tonal languages. By integrating these two powerful models into a unified SER framework, T2T enhances its capability to process and interpret nuanced emotional cues in speech, achieving superior performance compared to traditional SER methods. Evaluated across three major datasets - ThaiSER, EMOLA, and MU - the framework demonstrates significant improvements in unweighted accuracy and F1-score. Innovations such as emotional clustering representation and targeted emotional representation contribute to its high precision in detecting and differentiating subtle emotional states. Additionally, the integration of a fine-tuned teacher module aligns these advancements with practical SER applications, further increasing the framework's accuracy and sensitivity in real-world scenarios. The successful implementation of the T2T framework opens new avenues for enhancing SER technologies in other low-resource languages and extends its applicability to real-time processing applications, thereby advancing the field of computational emotion recognition.
dc.identifier.citation	Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics (2024) , 2882-2887
dc.identifier.doi	10.1109/SMC54092.2024.10830986
dc.identifier.issn	1062922X
dc.identifier.scopus	2-s2.0-85217844053
dc.identifier.uri	https://repository.li.mahidol.ac.th/handle/123456789/105407
dc.rights.holder	SCOPUS
dc.subject	Computer Science
dc.subject	Engineering
dc.title	Teacher-to-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion Model
dc.type	Conference Paper
mu.datasource.scopus	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85217844053&origin=inward
oaire.citation.endPage	2887
oaire.citation.startPage	2882
oaire.citation.title	Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
oairecerif.author.affiliation	Mahidol University
oairecerif.author.affiliation	Ltd

Collections

Scopus 2024

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th

Teacher-to-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion Model

Files

Collections