Finetuning Language Model for Person Description Search in Thai
2
Issued Date
2022-01-01
Resource Type
Scopus ID
2-s2.0-85151633056
Journal Title
6th International Conference on Information Technology, InCIT 2022
Start Page
207
End Page
210
Rights Holder(s)
SCOPUS
Bibliographic Citation
6th International Conference on Information Technology, InCIT 2022 (2022) , 207-210
Suggested Citation
Yuenyong S. Finetuning Language Model for Person Description Search in Thai. 6th International Conference on Information Technology, InCIT 2022 (2022) , 207-210. 210. doi:10.1109/InCIT56086.2022.10067683 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/84301
Title
Finetuning Language Model for Person Description Search in Thai
Author(s)
Author's Affiliation
Other Contributor(s)
Abstract
Person description search is matching a textual description of a person with the image of the same person. This is a multimodal image-text task, where the model generally has two branches: image and text. The objective is for these two branches to embed their respective input into a joint space, where the embeddings should be near each other if the image and text pair is a match, and far apart if they are not. The image branch can simply use pretrained vision models off-the-shelf without any modification, because 'person' is a common class in large image datasets. For the text branch on the other hand, person descriptions are not part of the dataset commonly used to train large language models (LM). Recent deep learning language models are based on the transformer architecture, which are commonly trained using large text corpus using masked language model loss. In this paper we propose finetuning the transformer-based LM in an unsupervised manner using the person description text before supervised training on the actual task. The result shows that unsupervised LM finetuning is beneficial for Thai person description search.
