AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers

dc.contributor.authorVachmanus S.
dc.contributor.authorPridasawas W.
dc.contributor.authorKusakunniran W.
dc.contributor.authorThamrongaphichartkul K.
dc.contributor.authorPhinklao N.
dc.contributor.correspondenceVachmanus S.
dc.contributor.otherMahidol University
dc.date.accessioned2026-02-06T18:13:16Z
dc.date.available2026-02-06T18:13:16Z
dc.date.issued2026-03-01
dc.description.abstractIn major training and export markets, the coffee bean grading process still relies heavily on manual labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming, costly, and prone to human error, especially within Thailand's rapidly expanding Robusta coffee sector. This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that operationalizes active learning and transformer-based feature extraction within a single, production-oriented pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL) query strategies, random sampling, entropy-based selection, Bayesian Active Learning by Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed to balance informativeness and representativeness during sample acquisition. A high-resolution dataset of 2098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy sampling showed the fastest early-stage gains, whereas BADGE lagged by >1 pp; Core-Set and Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline already achieving well-calibrated probabilities. Statistical validation via paired t-tests, effect sizes, and bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction of the labeling cost.
dc.identifier.citationIntelligent Systems with Applications Vol.29 (2026)
dc.identifier.doi10.1016/j.iswa.2025.200612
dc.identifier.issn26673053
dc.identifier.scopus2-s2.0-105024239946
dc.identifier.urihttps://repository.li.mahidol.ac.th/handle/123456789/114424
dc.rights.holderSCOPUS
dc.subjectComputer Science
dc.titleAL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers
dc.typeArticle
mu.datasource.scopushttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105024239946&origin=inward
oaire.citation.titleIntelligent Systems with Applications
oaire.citation.volume29
oairecerif.author.affiliationMahidol University
oairecerif.author.affiliationKing Mongkut's University of Technology Thonburi

Files

Collections