Publication:
Robust visual voice activity detection using Long Short-Term Memory recurrent neural network

dc.contributor.authorZaw Htet Aungen_US
dc.contributor.authorPanrasee Ritthipravaten_US
dc.contributor.otherMahidol Universityen_US
dc.date.accessioned2018-12-11T02:41:20Z
dc.date.accessioned2019-03-14T08:04:32Z
dc.date.available2018-12-11T02:41:20Z
dc.date.available2019-03-14T08:04:32Z
dc.date.issued2016-01-01en_US
dc.description.abstract© Springer International Publishing Switzerland 2016. Many traditional visual voice activity detection systems utilize features extracted from mouth region images which are sensitive to noisy observations of the visual domain. In addition, hyperparameters of the feature extraction process modulating the desired compromise between robustness, efficiency, and accuracy of the algorithm are difficult to be determined. Therefore, a visual voice activity detection algorithm which only utilizes simple lip shape information as features and a Long Short-Term Memory recurrent neural network (LSTM-RNN) as a classifier is proposed. Face detection is performed by structural SVM based on histogram of oriented gradient (HOG) features. Detected face template is used to initialize a kernelized correlation filter tracker. Facial landmark coordinates are then extracted from the tracked face. Centroid distance function is applied to the geometrically normalized landmarks surrounding the outer and inner lip contours. Finally, discriminative (LSTM-RNN) and generative (Hidden Markov Model) methods are used to model the temporal lip shape sequences during speech and non-speech intervals and their classification performances are compared. Experimental results show that the proposed algorithm using LSTMRNN can achieve a classification rate of 98% in labeling speech and non-speech periods. It is robust and efficient for realtime applications.en_US
dc.identifier.citationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol.9431, (2016), 380-391en_US
dc.identifier.doi10.1007/978-3-319-29451-3_31en_US
dc.identifier.issn16113349en_US
dc.identifier.issn03029743en_US
dc.identifier.other2-s2.0-84959019631en_US
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/20.500.14594/43477
dc.rightsMahidol Universityen_US
dc.rights.holderSCOPUSen_US
dc.source.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84959019631&origin=inwarden_US
dc.subjectComputer Scienceen_US
dc.subjectMathematicsen_US
dc.titleRobust visual voice activity detection using Long Short-Term Memory recurrent neural networken_US
dc.typeConference Paperen_US
dspace.entity.typePublication
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84959019631&origin=inwarden_US

Files

Collections