Real-Time Thai Speech Emotion Recognition With Speech Enhancement Using Time-Domain Contrastive Predictive Coding and Conv-Tasnet

Yuenyong S.Hnoohom N.Wongpatikaseree K.Singkul S.Mahidol University2023-06-182023-06-182022-01-01ICBIR 2022 - 2022 7th International Conference on Business and Industrial Research, Proceedings (2022) , 78-83https://repository.li.mahidol.ac.th/handle/123456789/84010Speech emotion recognition (SER) is an important part of human-computer interaction. SER face many challenges such as acoustic environment of speech, and the amount of data available for training. For Thai in particular, there is additional challenge from the language using tones, and the size of available dataset is relatively small. In this work we propose Thai Speech Emotion Recognition With Speech Enhancement (TH-SERSE). TH-SERSE consists of speech enhancement using Conv-TasNet followed by pre-training using contrastive predictive coding. The pre-trained model was then finetuned for emotion classification. We experimented on two datasets: EMOLA and ThaiSER that has open and closed acoustic environments, respectively. The experiments show that our method outperforms recently proposed methods.Business, Management and AccountingReal-Time Thai Speech Emotion Recognition With Speech Enhancement Using Time-Domain Contrastive Predictive Coding and Conv-TasnetConference PaperSCOPUS10.1109/ICBIR54589.2022.97864442-s2.0-85133135176