AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers

Vachmanus S.; Pridasawas W.; Kusakunniran W.; Thamrongaphichartkul K.; Phinklao N.

AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers

1

Issued Date

2026-03-01

Resource Type

Article

ISSN

26673053

DOI

10.1016/j.iswa.2025.200612

Scopus ID

2-s2.0-105024239946

Journal Title

Intelligent Systems with Applications

Volume

29

Rights Holder(s)

SCOPUS

Bibliographic Citation

Intelligent Systems with Applications Vol.29 (2026)

Suggested Citation

Vachmanus S., Pridasawas W., Kusakunniran W., Thamrongaphichartkul K., Phinklao N. AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers. Intelligent Systems with Applications Vol.29 (2026). doi:10.1016/j.iswa.2025.200612 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/114424

Title

AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers

Author(s)

Vachmanus S.
Pridasawas W.
Kusakunniran W.
Thamrongaphichartkul K.
Phinklao N.

Author's Affiliation

Mahidol University
King Mongkut's University of Technology Thonburi

Corresponding Author(s)

Vachmanus S.

Other Contributor(s)

Mahidol University

Abstract

In major training and export markets, the coffee bean grading process still relies heavily on manual labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming, costly, and prone to human error, especially within Thailand's rapidly expanding Robusta coffee sector. This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that operationalizes active learning and transformer-based feature extraction within a single, production-oriented pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL) query strategies, random sampling, entropy-based selection, Bayesian Active Learning by Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed to balance informativeness and representativeness during sample acquisition. A high-resolution dataset of 2098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy sampling showed the fastest early-stage gains, whereas BADGE lagged by >1 pp; Core-Set and Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline already achieving well-calibrated probabilities. Statistical validation via paired t-tests, effect sizes, and bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction of the labeling cost.

Keyword(s)

Computer Science

URI

https://repository.li.mahidol.ac.th/handle/123456789/114424

Collections

Scopus 2026

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th