Bilingual Audio Depression Identification Model by Machine Learning
4
Issued Date
2025-01-01
Resource Type
Scopus ID
2-s2.0-105016359557
Journal Title
2025 International Technical Conference on Circuits Systems Computers and Communications Itc Cscc 2025
Rights Holder(s)
SCOPUS
Bibliographic Citation
2025 International Technical Conference on Circuits Systems Computers and Communications Itc Cscc 2025 (2025)
Suggested Citation
Poomrittigul S., Kiatrungrit K., Homsiang P., Treebupachatsakul T. Bilingual Audio Depression Identification Model by Machine Learning. 2025 International Technical Conference on Circuits Systems Computers and Communications Itc Cscc 2025 (2025). doi:10.1109/ITC-CSCC66376.2025.11137688 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/112290
Title
Bilingual Audio Depression Identification Model by Machine Learning
Author's Affiliation
Corresponding Author(s)
Other Contributor(s)
Abstract
The number of depression patients worldwide, particularly in Thailand, is increasing on an upward trend. Depression screening commonly relies on self-report questionnaires. However, these instruments provide subjective assessments. Recent advancements in machine learning technology offer potential improvements in diagnostic accuracy through more objective measures. This study aims to evaluate the effectiveness of machine learning models in classifying depression using a bilingual audio dataset comprising Thai and English languages. Such models have the potential to assist clinicians by providing objective preliminary screening for depression based on vocal analysis, enhancing diagnostic precision and clinical decision-making. Various machine learning models were implemented including KNN, MLP, Random Forest, Decision Tree, SGD, Logistic Regression, SVM, AdaBoost, and Gaussian Naïve Bayes using MFCC-converted audio datasets. The results indicate that machine learning models effectively classify and identify depression even in bilingual audio datasets compared to individual language models, with the highest accuracy reaching 0.95 from MLP and KNN when testing the trained model by a single Thai audio.
