AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers
Issued Date
2026-03-01
Resource Type
ISSN
26673053
Scopus ID
2-s2.0-105024239946
Journal Title
Intelligent Systems with Applications
Volume
29
Rights Holder(s)
SCOPUS
Bibliographic Citation
Intelligent Systems with Applications Vol.29 (2026)
Suggested Citation
Vachmanus S., Pridasawas W., Kusakunniran W., Thamrongaphichartkul K., Phinklao N. AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers. Intelligent Systems with Applications Vol.29 (2026). doi:10.1016/j.iswa.2025.200612 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/114424
Title
AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers
Author's Affiliation
Corresponding Author(s)
Other Contributor(s)
Abstract
In major training and export markets, the coffee bean grading process still relies heavily on manual labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming, costly, and prone to human error, especially within Thailand's rapidly expanding Robusta coffee sector. This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that operationalizes active learning and transformer-based feature extraction within a single, production-oriented pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL) query strategies, random sampling, entropy-based selection, Bayesian Active Learning by Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed to balance informativeness and representativeness during sample acquisition. A high-resolution dataset of 2098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy sampling showed the fastest early-stage gains, whereas BADGE lagged by >1 pp; Core-Set and Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline already achieving well-calibrated probabilities. Statistical validation via paired t-tests, effect sizes, and bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction of the labeling cost.
