Real-Time Thai Speech Emotion Recognition With Speech Enhancement Using Time-Domain Contrastive Predictive Coding and Conv-Tasnet

Yuenyong S.; Hnoohom N.; Wongpatikaseree K.; Singkul S.

Real-Time Thai Speech Emotion Recognition With Speech Enhancement Using Time-Domain Contrastive Predictive Coding and Conv-Tasnet

36

Issued Date

2022-01-01

Resource Type

Conference Paper

DOI

10.1109/ICBIR54589.2022.9786444

Scopus ID

2-s2.0-85133135176

Journal Title

ICBIR 2022 - 2022 7th International Conference on Business and Industrial Research, Proceedings

Start Page

78

End Page

83

Rights Holder(s)

SCOPUS

Bibliographic Citation

ICBIR 2022 - 2022 7th International Conference on Business and Industrial Research, Proceedings (2022) , 78-83

Suggested Citation

Yuenyong S., Hnoohom N., Wongpatikaseree K., Singkul S. Real-Time Thai Speech Emotion Recognition With Speech Enhancement Using Time-Domain Contrastive Predictive Coding and Conv-Tasnet. ICBIR 2022 - 2022 7th International Conference on Business and Industrial Research, Proceedings (2022) , 78-83. 83. doi:10.1109/ICBIR54589.2022.9786444 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/84010

Title

Real-Time Thai Speech Emotion Recognition With Speech Enhancement Using Time-Domain Contrastive Predictive Coding and Conv-Tasnet

Author(s)

Yuenyong S.
Hnoohom N.
Wongpatikaseree K.
Singkul S.

Author's Affiliation

Mahidol University
Ltd.

Other Contributor(s)

Mahidol University

Abstract

Speech emotion recognition (SER) is an important part of human-computer interaction. SER face many challenges such as acoustic environment of speech, and the amount of data available for training. For Thai in particular, there is additional challenge from the language using tones, and the size of available dataset is relatively small. In this work we propose Thai Speech Emotion Recognition With Speech Enhancement (TH-SERSE). TH-SERSE consists of speech enhancement using Conv-TasNet followed by pre-training using contrastive predictive coding. The pre-trained model was then finetuned for emotion classification. We experimented on two datasets: EMOLA and ThaiSER that has open and closed acoustic environments, respectively. The experiments show that our method outperforms recently proposed methods.

Keyword(s)

Business, Management and Accounting

URI

https://repository.li.mahidol.ac.th/handle/123456789/84010

Collections

Scopus 2022

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th