Robust visual voice activity detection using Long Short-Term Memory recurrent neural network

Zaw Htet Aung; Panrasee Ritthipravat

Publication:
Robust visual voice activity detection using Long Short-Term Memory recurrent neural network

dc.contributor.author	Zaw Htet Aung	en_US
dc.contributor.author	Panrasee Ritthipravat	en_US
dc.contributor.other	Mahidol University	en_US
dc.date.accessioned	2018-12-11T02:41:20Z
dc.date.accessioned	2019-03-14T08:04:32Z
dc.date.available	2018-12-11T02:41:20Z
dc.date.available	2019-03-14T08:04:32Z
dc.date.issued	2016-01-01	en_US
dc.description.abstract	© Springer International Publishing Switzerland 2016. Many traditional visual voice activity detection systems utilize features extracted from mouth region images which are sensitive to noisy observations of the visual domain. In addition, hyperparameters of the feature extraction process modulating the desired compromise between robustness, efficiency, and accuracy of the algorithm are difficult to be determined. Therefore, a visual voice activity detection algorithm which only utilizes simple lip shape information as features and a Long Short-Term Memory recurrent neural network (LSTM-RNN) as a classifier is proposed. Face detection is performed by structural SVM based on histogram of oriented gradient (HOG) features. Detected face template is used to initialize a kernelized correlation filter tracker. Facial landmark coordinates are then extracted from the tracked face. Centroid distance function is applied to the geometrically normalized landmarks surrounding the outer and inner lip contours. Finally, discriminative (LSTM-RNN) and generative (Hidden Markov Model) methods are used to model the temporal lip shape sequences during speech and non-speech intervals and their classification performances are compared. Experimental results show that the proposed algorithm using LSTMRNN can achieve a classification rate of 98% in labeling speech and non-speech periods. It is robust and efficient for realtime applications.	en_US
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol.9431, (2016), 380-391	en_US
dc.identifier.doi	10.1007/978-3-319-29451-3_31	en_US
dc.identifier.issn	16113349	en_US
dc.identifier.issn	03029743	en_US
dc.identifier.other	2-s2.0-84959019631	en_US
dc.identifier.uri	https://repository.li.mahidol.ac.th/handle/20.500.14594/43477
dc.rights	Mahidol University	en_US
dc.rights.holder	SCOPUS	en_US
dc.source.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84959019631&origin=inward	en_US
dc.subject	Computer Science	en_US
dc.subject	Mathematics	en_US
dc.title	Robust visual voice activity detection using Long Short-Term Memory recurrent neural network	en_US
dc.type	Conference Paper	en_US
dspace.entity.type	Publication
mu.datasource.scopus	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84959019631&origin=inward	en_US

Collections

Scopus 2016-2017

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th

Publication: Robust visual voice activity detection using Long Short-Term Memory recurrent neural network

Files

Collections

Publication:
Robust visual voice activity detection using Long Short-Term Memory recurrent neural network