AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers

Vachmanus S.; Pridasawas W.; Kusakunniran W.; Thamrongaphichartkul K.; Phinklao N.

AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers

dc.contributor.author	Vachmanus S.
dc.contributor.author	Pridasawas W.
dc.contributor.author	Kusakunniran W.
dc.contributor.author	Thamrongaphichartkul K.
dc.contributor.author	Phinklao N.
dc.contributor.correspondence	Vachmanus S.
dc.contributor.other	Mahidol University
dc.date.accessioned	2026-02-06T18:13:16Z
dc.date.available	2026-02-06T18:13:16Z
dc.date.issued	2026-03-01
dc.description.abstract	In major training and export markets, the coffee bean grading process still relies heavily on manual labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming, costly, and prone to human error, especially within Thailand's rapidly expanding Robusta coffee sector. This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that operationalizes active learning and transformer-based feature extraction within a single, production-oriented pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL) query strategies, random sampling, entropy-based selection, Bayesian Active Learning by Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed to balance informativeness and representativeness during sample acquisition. A high-resolution dataset of 2098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy sampling showed the fastest early-stage gains, whereas BADGE lagged by >1 pp; Core-Set and Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline already achieving well-calibrated probabilities. Statistical validation via paired t-tests, effect sizes, and bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction of the labeling cost.
dc.identifier.citation	Intelligent Systems with Applications Vol.29 (2026)
dc.identifier.doi	10.1016/j.iswa.2025.200612
dc.identifier.issn	26673053
dc.identifier.scopus	2-s2.0-105024239946
dc.identifier.uri	https://repository.li.mahidol.ac.th/handle/123456789/114424
dc.rights.holder	SCOPUS
dc.subject	Computer Science
dc.title	AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers
dc.type	Article
mu.datasource.scopus	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105024239946&origin=inward
oaire.citation.title	Intelligent Systems with Applications
oaire.citation.volume	29
oairecerif.author.affiliation	Mahidol University
oairecerif.author.affiliation	King Mongkut's University of Technology Thonburi

Collections

Scopus 2026

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th

AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers

Files

Collections