AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers
| dc.contributor.author | Vachmanus S. | |
| dc.contributor.author | Pridasawas W. | |
| dc.contributor.author | Kusakunniran W. | |
| dc.contributor.author | Thamrongaphichartkul K. | |
| dc.contributor.author | Phinklao N. | |
| dc.contributor.correspondence | Vachmanus S. | |
| dc.contributor.other | Mahidol University | |
| dc.date.accessioned | 2026-02-06T18:13:16Z | |
| dc.date.available | 2026-02-06T18:13:16Z | |
| dc.date.issued | 2026-03-01 | |
| dc.description.abstract | In major training and export markets, the coffee bean grading process still relies heavily on manual labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming, costly, and prone to human error, especially within Thailand's rapidly expanding Robusta coffee sector. This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that operationalizes active learning and transformer-based feature extraction within a single, production-oriented pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL) query strategies, random sampling, entropy-based selection, Bayesian Active Learning by Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed to balance informativeness and representativeness during sample acquisition. A high-resolution dataset of 2098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy sampling showed the fastest early-stage gains, whereas BADGE lagged by >1 pp; Core-Set and Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline already achieving well-calibrated probabilities. Statistical validation via paired t-tests, effect sizes, and bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction of the labeling cost. | |
| dc.identifier.citation | Intelligent Systems with Applications Vol.29 (2026) | |
| dc.identifier.doi | 10.1016/j.iswa.2025.200612 | |
| dc.identifier.issn | 26673053 | |
| dc.identifier.scopus | 2-s2.0-105024239946 | |
| dc.identifier.uri | https://repository.li.mahidol.ac.th/handle/123456789/114424 | |
| dc.rights.holder | SCOPUS | |
| dc.subject | Computer Science | |
| dc.title | AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers | |
| dc.type | Article | |
| mu.datasource.scopus | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105024239946&origin=inward | |
| oaire.citation.title | Intelligent Systems with Applications | |
| oaire.citation.volume | 29 | |
| oairecerif.author.affiliation | Mahidol University | |
| oairecerif.author.affiliation | King Mongkut's University of Technology Thonburi |
