Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
1
Issued Date
2025-01-01
Resource Type
ISSN
0736587X
Scopus ID
2-s2.0-105021028710
Journal Title
Proceedings of the Annual Meeting of the Association for Computational Linguistics
Volume
1
Start Page
18685
End Page
18717
Rights Holder(s)
SCOPUS
Bibliographic Citation
Proceedings of the Annual Meeting of the Association for Computational Linguistics Vol.1 (2025) , 18685-18717
Suggested Citation
Cahyawijaya S., Lovenia H., Moniz J.R.A., Wong T.H., Farhansyah M.R., Maung T.T., Hudi F., Anugraha D., Habibi M.R.S., Qorib M.R., Agarwal A., Imperial J.M., Patel H.L., Feliren V., Nasution B.I., Rufino M.A., Winata G.I., Rajagede R.A., Catalan C.R., Imam M.F., Pattnayak P., Pranida S.Z., Pratama K., Bangera Y., Na-Thalang A., Monderin P.N., Song Y., Simon C., Ng L.H.X., Sapan R.L., Rafi T.H., Wang B., Supryadi, Veerakanjana K., Ittichaiwong P., Roque M.T., Vincentio K., Kreangphet T., Artkaew P., Palgunadi K.H., Yu Y., Hastuti R.P., Nixon W., Bangera M., Lim A.X.W., Khine A.H., Zhafran H.M., Ferdinan T., Izzani A.A., Singh A., Evan, Krito J.A., Anugraha M., Ilasariya F.A., Li H., Daniswara J.A., Tjiaranata F.A., Yulianrifat E.P., Udomcharoenchaikit C., Ansori F.R., Ihsani M.K., Nguyen G., Barik A.M., Velasco D.J., Genadi R.A., Saha S., Wei C., Flores I., Chen K.K.H., Santos A.G., Lim W.S., Phyo K.S., Santos T., Dwiastuti M., Luo J., Cruz J.C.B., Hee M.S., Hanif I.A., Alif Al Hakim M., Sya'ban M.R., Kerdthaisong K., Miranda L.J.V., Koto F., Fatyanosa T.N., Aji A.F., Rosal J.J., Kevin J., Wijaya R., Kampman O.P., Zhang R., Karlsson B.F., Limkonchotiwat P. Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia. Proceedings of the Annual Meeting of the Association for Computational Linguistics Vol.1 (2025) , 18685-18717. 18717. Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/113066
Title
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Author(s)
Cahyawijaya S.
Lovenia H.
Moniz J.R.A.
Wong T.H.
Farhansyah M.R.
Maung T.T.
Hudi F.
Anugraha D.
Habibi M.R.S.
Qorib M.R.
Agarwal A.
Imperial J.M.
Patel H.L.
Feliren V.
Nasution B.I.
Rufino M.A.
Winata G.I.
Rajagede R.A.
Catalan C.R.
Imam M.F.
Pattnayak P.
Pranida S.Z.
Pratama K.
Bangera Y.
Na-Thalang A.
Monderin P.N.
Song Y.
Simon C.
Ng L.H.X.
Sapan R.L.
Rafi T.H.
Wang B.
Supryadi
Veerakanjana K.
Ittichaiwong P.
Roque M.T.
Vincentio K.
Kreangphet T.
Artkaew P.
Palgunadi K.H.
Yu Y.
Hastuti R.P.
Nixon W.
Bangera M.
Lim A.X.W.
Khine A.H.
Zhafran H.M.
Ferdinan T.
Izzani A.A.
Singh A.
Evan
Krito J.A.
Anugraha M.
Ilasariya F.A.
Li H.
Daniswara J.A.
Tjiaranata F.A.
Yulianrifat E.P.
Udomcharoenchaikit C.
Ansori F.R.
Ihsani M.K.
Nguyen G.
Barik A.M.
Velasco D.J.
Genadi R.A.
Saha S.
Wei C.
Flores I.
Chen K.K.H.
Santos A.G.
Lim W.S.
Phyo K.S.
Santos T.
Dwiastuti M.
Luo J.
Cruz J.C.B.
Hee M.S.
Hanif I.A.
Alif Al Hakim M.
Sya'ban M.R.
Kerdthaisong K.
Miranda L.J.V.
Koto F.
Fatyanosa T.N.
Aji A.F.
Rosal J.J.
Kevin J.
Wijaya R.
Kampman O.P.
Zhang R.
Karlsson B.F.
Limkonchotiwat P.
Lovenia H.
Moniz J.R.A.
Wong T.H.
Farhansyah M.R.
Maung T.T.
Hudi F.
Anugraha D.
Habibi M.R.S.
Qorib M.R.
Agarwal A.
Imperial J.M.
Patel H.L.
Feliren V.
Nasution B.I.
Rufino M.A.
Winata G.I.
Rajagede R.A.
Catalan C.R.
Imam M.F.
Pattnayak P.
Pranida S.Z.
Pratama K.
Bangera Y.
Na-Thalang A.
Monderin P.N.
Song Y.
Simon C.
Ng L.H.X.
Sapan R.L.
Rafi T.H.
Wang B.
Supryadi
Veerakanjana K.
Ittichaiwong P.
Roque M.T.
Vincentio K.
Kreangphet T.
Artkaew P.
Palgunadi K.H.
Yu Y.
Hastuti R.P.
Nixon W.
Bangera M.
Lim A.X.W.
Khine A.H.
Zhafran H.M.
Ferdinan T.
Izzani A.A.
Singh A.
Evan
Krito J.A.
Anugraha M.
Ilasariya F.A.
Li H.
Daniswara J.A.
Tjiaranata F.A.
Yulianrifat E.P.
Udomcharoenchaikit C.
Ansori F.R.
Ihsani M.K.
Nguyen G.
Barik A.M.
Velasco D.J.
Genadi R.A.
Saha S.
Wei C.
Flores I.
Chen K.K.H.
Santos A.G.
Lim W.S.
Phyo K.S.
Santos T.
Dwiastuti M.
Luo J.
Cruz J.C.B.
Hee M.S.
Hanif I.A.
Alif Al Hakim M.
Sya'ban M.R.
Kerdthaisong K.
Miranda L.J.V.
Koto F.
Fatyanosa T.N.
Aji A.F.
Rosal J.J.
Kevin J.
Wijaya R.
Kampman O.P.
Zhang R.
Karlsson B.F.
Limkonchotiwat P.
Author's Affiliation
University of Toronto
University of Illinois Urbana-Champaign
The University of Manchester
National University of Singapore
Monash University
Tianjin University
New York University
Carnegie Mellon University
Brown University
Auburn University
Hanyang University
Chulalongkorn University
University of Bath
Universitas Indonesia
Polytechnique Montréal
Universitas Gadjah Mada
Institut Teknologi Bandung
Nara Institute of Science and Technology
Thammasat University
Institut Teknologi Sepuluh Nopember
Macau University of Science and Technology
Brawijaya University
Siriraj Hospital
King Mongkut's University of Technology Thonburi
Bina Nusantara University
Indian Statistical Institute, Kolkata
Ton-Duc-Thang University
Seoul National University of Science and Technology
A-Star, Institute for Infocomm Research
Singapore University of Technology and Design
Srinakharinwirot University
Universitas Islam Indonesia
Ateneo de Manila University
University of New Haven
Mohamed Bin Zayed University of Artificial Intelligence
Montreal Institute for Learning Algorithms
Universitas Pelita Harapan
Oracle Corporation
University of the Philippines
Vidyasirimedhi Institute of Science and Technology
Singapore Polytechnic
National University, Philippines
Sony Group Corporation
Graphcore Limited
MOH Office for Healthcare Transformation
Allen Institute for AI
AI Singapore
Beijing Academy of Artificial Intelligence (BAAI)
Wroclaw Tech
Cohere
SCB 10X
Meta
Samsung R&D Institute Philippines
Capital One
Works Applications Lab
SEACrowd
Dataxet:Sonar
IndoNLP
University of Illinois Urbana-Champaign
The University of Manchester
National University of Singapore
Monash University
Tianjin University
New York University
Carnegie Mellon University
Brown University
Auburn University
Hanyang University
Chulalongkorn University
University of Bath
Universitas Indonesia
Polytechnique Montréal
Universitas Gadjah Mada
Institut Teknologi Bandung
Nara Institute of Science and Technology
Thammasat University
Institut Teknologi Sepuluh Nopember
Macau University of Science and Technology
Brawijaya University
Siriraj Hospital
King Mongkut's University of Technology Thonburi
Bina Nusantara University
Indian Statistical Institute, Kolkata
Ton-Duc-Thang University
Seoul National University of Science and Technology
A-Star, Institute for Infocomm Research
Singapore University of Technology and Design
Srinakharinwirot University
Universitas Islam Indonesia
Ateneo de Manila University
University of New Haven
Mohamed Bin Zayed University of Artificial Intelligence
Montreal Institute for Learning Algorithms
Universitas Pelita Harapan
Oracle Corporation
University of the Philippines
Vidyasirimedhi Institute of Science and Technology
Singapore Polytechnic
National University, Philippines
Sony Group Corporation
Graphcore Limited
MOH Office for Healthcare Transformation
Allen Institute for AI
AI Singapore
Beijing Academy of Artificial Intelligence (BAAI)
Wroclaw Tech
Cohere
SCB 10X
Meta
Samsung R&D Institute Philippines
Capital One
Works Applications Lab
SEACrowd
Dataxet:Sonar
IndoNLP
Corresponding Author(s)
Other Contributor(s)
Abstract
Despite Southeast Asia's (SEA) extraordinary linguistic and cultural diversity, the region remains significantly underrepresented in vision-language (VL) research, resulting in AI models that inadequately capture SEA cultural nuances. To fill this gap, we present SEA-VL, an open-source initiative dedicated to developing culturally relevant high-quality datasets for SEA languages. By involving contributors from SEA countries, SEA-VL ensures better cultural relevance and diversity, fostering greater inclusivity of underrepresented languages and cultural depictions in VL research. Our methodology employed three approaches: community-driven crowdsourcing with SEA contributors, automated image crawling, and synthetic image generation. We evaluated each method's effectiveness in capturing cultural relevance. We found that image crawling achieves approximately ∼85% cultural relevance while being more cost- and time-efficient than crowdsourcing, whereas synthetic image generation failed to accurately reflect SEA cultural nuances and contexts. Collectively, we gathered 1.28 million SEA culturally relevant images, more than 50 times larger than other existing datasets. This work bridges the representation gap in SEA, establishes a foundation for developing culturally aware AI systems for this region, and provides a replicable framework for addressing representation gaps in other underrepresented regions.
