Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor
Issued Date
2022-12-01
Resource Type
eISSN
25042289
Scopus ID
2-s2.0-85144601088
Journal Title
Big Data and Cognitive Computing
Volume
6
Issue
4
Rights Holder(s)
SCOPUS
Bibliographic Citation
Big Data and Cognitive Computing Vol.6 No.4 (2022)
Suggested Citation
Yuenyong S., Wongpatikaseree K. Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor. Big Data and Cognitive Computing Vol.6 No.4 (2022). doi:10.3390/bdcc6040136 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/83958
Title
Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor
Author(s)
Author's Affiliation
Other Contributor(s)
Abstract
Due to the ubiquitous nature of CCTV cameras that record continuously, there is a large amount of video data that are unstructured. Often, when these recordings have to be reviewed, it is to look for a specific person that fits a certain description. Currently, this is achieved by manual inspection of the videos, which is both time-consuming and labor-intensive. While person description search is not a new topic, in this work, we made two contributions. First, we improve upon the existing state-of-the-art by proposing unsupervised finetuning on the language model that forms a main part of the text branch of person description search models. This led to higher recall values on the standard dataset. The second contribution is that we engineered a complete pipeline from video files to fast searchable objects. Due to the use of an approximate nearest neighbor search and some model optimizations, a person description search can be performed such that the result is available immediately when deployed on a standard PC with no GPU, allowing an interactive search. We demonstrated the effectiveness of the system on new data and showed that most people in the videos can be successfully discovered by the search.
