Real-Time Thai Speech Emotion Recognition With Speech Enhancement Using Time-Domain Contrastive Predictive Coding and Conv-Tasnet
Issued Date
2022-01-01
Resource Type
Scopus ID
2-s2.0-85133135176
Journal Title
ICBIR 2022 - 2022 7th International Conference on Business and Industrial Research, Proceedings
Start Page
78
End Page
83
Rights Holder(s)
SCOPUS
Bibliographic Citation
ICBIR 2022 - 2022 7th International Conference on Business and Industrial Research, Proceedings (2022) , 78-83
Suggested Citation
Yuenyong S., Hnoohom N., Wongpatikaseree K., Singkul S. Real-Time Thai Speech Emotion Recognition With Speech Enhancement Using Time-Domain Contrastive Predictive Coding and Conv-Tasnet. ICBIR 2022 - 2022 7th International Conference on Business and Industrial Research, Proceedings (2022) , 78-83. 83. doi:10.1109/ICBIR54589.2022.9786444 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/84010
Title
Real-Time Thai Speech Emotion Recognition With Speech Enhancement Using Time-Domain Contrastive Predictive Coding and Conv-Tasnet
Author(s)
Author's Affiliation
Other Contributor(s)
Abstract
Speech emotion recognition (SER) is an important part of human-computer interaction. SER face many challenges such as acoustic environment of speech, and the amount of data available for training. For Thai in particular, there is additional challenge from the language using tones, and the size of available dataset is relatively small. In this work we propose Thai Speech Emotion Recognition With Speech Enhancement (TH-SERSE). TH-SERSE consists of speech enhancement using Conv-TasNet followed by pre-training using contrastive predictive coding. The pre-trained model was then finetuned for emotion classification. We experimented on two datasets: EMOLA and ThaiSER that has open and closed acoustic environments, respectively. The experiments show that our method outperforms recently proposed methods.