Chai-Calibrated Hybrid Assessment for IELTS Speaking with Human-Referenced Validation

Polasa P.; Laoaree S.; Thanadunpremdet T.; Rodjananant N.; Kritsuthikul N.

Chai-Calibrated Hybrid Assessment for IELTS Speaking with Human-Referenced Validation

Issued Date

2025-01-01

Resource Type

Conference Paper

DOI

10.1109/iSAI-NLP66160.2025.11320542

Scopus ID

2-s2.0-105032755968

Journal Title

2025 20th International Joint Symposium on Artificial Intelligence and Natural Language Processing Isai Nlp 2025

Rights Holder(s)

SCOPUS

Bibliographic Citation

2025 20th International Joint Symposium on Artificial Intelligence and Natural Language Processing Isai Nlp 2025 (2025)

Suggested Citation

Polasa P., Laoaree S., Thanadunpremdet T., Rodjananant N., Kritsuthikul N. Chai-Calibrated Hybrid Assessment for IELTS Speaking with Human-Referenced Validation. 2025 20th International Joint Symposium on Artificial Intelligence and Natural Language Processing Isai Nlp 2025 (2025). doi:10.1109/iSAI-NLP66160.2025.11320542 Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/115785

Title

Chai-Calibrated Hybrid Assessment for IELTS Speaking with Human-Referenced Validation

Author(s)

Polasa P.
Laoaree S.
Thanadunpremdet T.
Rodjananant N.
Kritsuthikul N.

Author's Affiliation

Mahidol University
Chulalongkorn University
King Mongkut's University of Technology Thonburi
Thailand National Electronics and Computer Technology Center

Corresponding Author(s)

Polasa P.

Other Contributor(s)

Mahidol University

Abstract

We present CHAI, a rubric-aligned framework for IELTS Speaking that combines an accent-aware ASR backbone with self-supervised speech representations to deliver criterion level feedback. CHAI adopts a dual-agent design: a low-latency Coach for live turn-taking (Whisper-TH large) and a read-only Judge for independent scoring (Whisper-base). Evidence integrates pronunciation similarity from HuBERT-style embeddings with alignment/timing cues, prosody, and transcript-derived indicators to estimate bands for Fluency & Coherence, Lexical Resource, and Grammatical Range & Accuracy. Two certified IELTS examiners Expert A and B) and a small crowd panel (Crowd mean) serve as human references in a classroom-style evaluation with Thai EFL learners across three role-play scenarios (restaurant, airport, job interview). Agreement is reported on the band scale using mean absolute error (MAE) as the primary metric, with latency tracked for usability. The hybrid fusion with a lightweight human prior yields the lowest overall MAE (0.410), outperforming single-model baselines and individually considered human references (Expert A: 0.430; Expert B: 0.451; Crowd mean: 0.512); per-criterion MAE likewise favors the hybrid (F/C 0.409, LR 0.402, GRA 0.418). Latency supports near-real-time classroom feedback for ∼ 10 s turns. Despite a focus on a Thaicentric corpus and sensitivity to ASR timing and fairness, results indicate that model-human hybridization is a practical pathway to consistent, scalable IELTS-aligned feedback.

Keyword(s)

Computer Science
Engineering

URI

https://repository.li.mahidol.ac.th/handle/123456789/115785

Collections

Scopus 2025

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th